IBM Network Advisor Best Practices and Deployment Guide_v3.10
description
Transcript of IBM Network Advisor Best Practices and Deployment Guide_v3.10
IBM Network Advisor Deployment and Best Practices Guide
IBM Network Advisor Deployment and Best Practice GuideVersion: 3.10Owner: Jim OlsonAuthor: Eric Block, David Lutz & Sudharsan S Vangal
http://ibm.biz/brocdesignbpTable of Contents
2Table of Contents
5Document history
5Document Location
6Approvals
6Distribution
8Introduction
8When to use Network Advisor
9Best Practices Recommendations
10Regular Tasks for SAN Health
10Daily
10Weekly
11Monthly
12Quarterly
13Network Advisor
13Server Sizing and Configuration
14Server and Client Ports
16Downloading IBM Network Advisor
18Installing IBM Network Advisor
27Launching the Remote Client
28User Account Management
29Server Management Console
32IBM Network Advisor Configuration Screen
34Backup and Restore Configuration Data
34Switch Backup and Restore
35Restoring a switch configuration for a selected device
35Scheduling Switch Backups
37Server Data Backup and Restore
39Viewing the backup status
39Server Data Restore
40Event Logs
41Collect SupportSave
41Network Advisor Supportsave
42Supportsave Manual Collection
43Supportsave Scheduled Collection
44Event notification
44Call Home
44SNMP
46Fabric Watch
46Reasons to Implement Fabric Watch
46Configuring Fabric Watch
52Bottleneck Credit Tools
52Enabling bottleneck credit tools
53Bottleneck Detection
53Recommendations
53Suggested Bottleneck Settings
53FOS 6.3
53FOS 6.4
54FOS 7.0
54Implementation
54Enable Bottleneckmon via GUI
54Enable Bottleneckmon via CLI
55How Bottlenecks are reported in Network Advisor
56Port Fencing
56Implementation
57Adding thresholds (Violation types):
57Assigning thresholds to ports:
58Unblocking a Port
58Removing Thresholds
60Brocade Fabric Vision
61Monitoring and Alerting Policy Suite (MAPS)
61MAPS Licensing Requirements and Software Prerequisites
61Differences between Fabric Watch and MAPS configurations
62Converting from Fabric Watch to MAPS
62Initial MAPS setup
63Importing MAPS configuration
66Replicating a policy to other devices
68MAPS and Bottleneck Monitor
68Enable MAPS in Network Advisor
69Activate MAPS Policy from Network Advisor
70View the Parameters in a Policy
72Network Advisor Dashboards
73Brocade SAN Health Report
73Instructions For Usage
76Zoning
93Conclusion
94References
Document history
Document Location
The source of the document can be found in the Team Room, located at:
Database Name:TBD Server Name:
TBD File Name:
IBM Network Advisor Deployment Guide V3.03.doc Please address any questions to: Revision History
Date of this revision: 01/16/2014Date of next revision: TBD
Revision NumberRevision DateSummary of ChangesChanges marked
1.06/11/12 Initial document creationNo
1.1
1.27/10/12Revised to meet requirements for standardized deploymentNo
1.37/25/12Added SNMP and Performance informationNo
1.47/29/12Added Zoning InformationNo
1.58/1/12Edited document for added emphasis to key points, as well as alteration to technical terms, per Art ScrimoNo
1.69/14/12Added SAN Health information to Health Check section. Added information to Fault Management and SNMP section. Added Event Logs section. Added Switch Backup and Restore.No
1.79/24/12Removed Linux from Network Advisor server optionsNo
1.89/25/12Incorporated Best Practices into guideNo
1.910/1/12Expanded on SAN Health Tool sectionNo
2.010/4/12Moved Security and Authentication to SAN Design GuideNo
2.110/24/12Added User Account Management sectionNo
2.210/30/12Removed duplicate switch recovery information. Edited overall content for flow/clarityNo
2.311/6/12Added links for navigating document more efficientlyNo
2.411/15/12Added Reference section. Edited Bottleneck and Port Fencing sections for Network Advisor (vs CLI)No
2.512/19/12Added information for SNMPv3, Call Home, Automatic Trace DumpsNo
2.61/14/13Minor edits to wordingNo
2.73/15/13Updated Port Fencing information based on alert severity changing in FOS v7.0.2c (per John Juenemann 20130313 Update Initiative)No
2.87/17/13Added additional detail/instruction for SAN Health usageNo
2.901/14/14Per Jim Olson and Kirby Dahman, changed Fabric Watch F_port Class thresholds to 25 for two alerts Link Reset and State ChangeNo
2.91
01/16/14Modified appearance of Fabric Watch alerts table for better clarity/detail (no FW values changed)No
3.005/28/14Added a new section for Flow Vision MAPS. Pages 50 54 (Updated as per Jim Olson's directive to include Fabric Vision)No
3.106/01/14Added section for Fabric Vision introduction. Added table for MAPS Threshold ValuesNo
3.206/02/14Corrected MAPS implementation section for more clarity. No
3.306/03/14Added Moderate Policy also for the MAPS Threshold values. Corrected FOS version requirement for MAPS.No
3.406/03/14Added section for Replicating Policies to Other devices as per Tron's request. No
3.506/05/14Corrected the Threshold policies. Added configuration screenshots for INANo
3.606/17/14Provided more clarity on MAPS advantages, and features over Fabric WatchYes
3.707/14/2014Included updates provided by David Lutz on the Fabric Watch. Yes
3.808/15/2014Restructured document and created new section for recommendations. MAPS section revised to show recommended MAPS implementation, Fabric Vision section update to provide better clarity. Section added to for collection supportsave files from Network Advisor, and updated the SAN regular tasks.Yes
3.1011/10/2014Added link to server configuration section for more detailYes
Approvals
This document requires following approvals:
NameTitle
Jim OlsonDistinguished Engineer
Distribution
This document has been distributed to:
NameTitle
Jim OlsonDistinguished Engineer
Ann CorraoDistinguished Engineer
John Juenemann Senior Technical Staff Member (STSM)
Karen Haberli Program Manager
Eric BlockStorage Architect
Sudharsan S VangalStorage Administrator
IntroductionThe purpose of this document is to present a set of guidelines that incorporate IBM best practices for deploying IBM Network Advisor (a.k.a. Brocade Network Advisor). This guide should act as a reference point in establishing consistent, standard deployments across IBM environments.
The best practices noted in this guide present some the more advanced features of Brocade Fabric OS (FOS) for example, Fabric Watch, Bottleneck Detection, and Port Fencing. Additional best practices are provided for hardware selection, zoning, and performing scheduled health-related checks and tasks in the SAN.
The guidance found in this document should provide you with an efficient, economic, and effective process by which to deploy and begin managing IBM Network Advisor.
NOTE: All deployments should be done using the Enterprise version of IBM Network Advisor.When to use Network Advisor
All SAN Fabric installations using Brocade technology should deploy IBM Network Advisor
If you are currently managing your Brocade SAN with DCFM, you should upgrade to Network Advisor per the following:
All 16Gb installations (or prior to upgrading to 16Gb)
Prior to upgrading any Brocade FOS product to level 7.x or aboveNOTE: DCFM is not qualified or supported for management of switches operating with FOS v7.0 and later firmware versions. You must first upgrade DCFM to Network Advisor 11.1 or later if you are planning to upgrade devices to FOS v7.0 or you risk losing management connectivity.Best Practices Recommendations
The following recommendations are based on best practice recommendations from Brocade and IBM technical support groups.
Install and use Network Advisor to manage all switches. See Network Advisor Setup Switch configuration backup. See Backup and Restore Configuration Data Enable Bottleneck Credit Recovery Tools. See Bottleneck Credit Tools Configure Call Home and SNMP or email event notification. See Event notification Switches running FOS 7.2 or higher setup MAPS. See Monitoring and Alerting Policy Suite (MAPS) Switches running FOS 7.1 or lower setup Fabric Watch. See Fabric Watch Configure and enable Bottleneck Detection. See Bottleneck Detection Configure Network Advisor Dashboards. See Network Advisor Dashboards Implement and follow regular SAN health tasks. See Regular Tasks for SAN HealthRegular Tasks for SAN Health
NOTE: The below should be considered mandatory tasks to be performed in any Brocade SAN environment. Consistent execution of these tasks will help to ensure your fabrics are operating optimally, and that you have adequate backup data available for unexpected impacts to the SAN. Additionally, performing these tasks will provide you with information which can be extremely useful in recognizing trends and also targeting sources of problems in assisting with the troubleshooting process.Daily
Review of Event LogsThe Master Log should be reviewed daily by the operations team as part of the health check process. Network Advisors Master Log lists all events and alerts that have occurred in the SAN and you should make it a habit of reviewing this log on a daily basis.
View specific logs by selecting an option from the Monitor menus Logs submenu. The following logs can be found here: Audit Log, Product Event Log, Fabric Log, FICON Log, Product Status Log, Security Log, Syslog Log. Fabric Watch, MAPS, Bottleneck Detection, and Port Fencing alerts will process like other alerts in the environment. They can be found in the IBM Network Advisor Master Log.
Weekly
Backup Switches Collect a set of configuration files in case they are required to restore the switch configuration. See Switch Backup and Restore section for how to do thisCollect Supportsaves
Collect a complete set of supportsave files from all switches before clearing the switch counters.
This will provide a set of switch logs from before the counters were cleared in case they are required for PD.
Provides a set of switch logs which can be used a baseline.
See Supportsave Scheduled Collection.Clear Switch Counters Counters that are never cleared are hard to troubleshoot, and you have no frame of reference for when the error counters on ports actually increased.
For this reason the Brocade best practice is to clear the counters on a known schedule, so that error counters seen are known to represent recent issues.
NOTE: Any time new devices are added to the SAN or cabling changes are made, it is common for ports to detect error. These errors should be cleared any time fabric changes are made.Action Automate a counter clear on all switches that runs on Sunday evening (suggest 6PM local time). You want this to happen after all the normally scheduled weekend changes should be complete and prior to production Sunday night / Monday morning workloads beginning to hit the production system.
Commands to be run:
Statsclear
Slotstatsclear
Monthly Review switch logs for marginal links or other potential switch issues.The following metrics are some of the key metrics when reviewing supportsave files.
PORTERRSHOWc3timeout / disc c3Frame discards are caused because frames are sitting in the frame buffers too long indicating that there are issues sending the frames.
Note: On older switch code levels a portstatsshow for any port with C3 discards may be required to determine if the discards are tx vs rxTx discards are frames that cannot be sent to the attached device, check for link issues then check the attached device
rx discards are frames that cannot be sent to the next hop in the switch. Check to see if other ports on the switch have tx discards.Check using framelog command to determine destination for rx frame discards.
crc_errThis counter is incremented when a frame with bad crc passes through the port.Need to determine where the source of the crc error occurred by check other ports and another switches for crc g_eof errors.
crc g_eofThis counter is incremented when a frame is detected was a crc error. This is the first port to detect the crc error.Typically caused by an optical issue often cables. Check cables, possible replace or swap the cables.Replace optics (HBA, SFP) on the attached device.
too shrttoo longbad eofIndication of frame errors.Typically caused by an optical issue often cables. Check cables, possible replace or swap the cables.
loss syncloss signLoss of sync and loss of signal typically occur when the optical link cycles usually at the attached device.Typically no actions are required unless counts are extremely high or occur during unexpected times.
SFPSHOWThe primary metric is Rx power which shows the amount of light the SFP is receiving.
Typically SFPs transmit around -2 to -3db (630 to 400uwatt) so for short distance cables receive power levels should be similar. Longer cables lengths will result in lower receive light levels and is not consider an issue. In general receive levels should not drop below -10db (100uwatt) unless its an extremely long cable run.
In general you should compare light levels to other cable runs of similar length and if you have noticeably lower levels compare to the other cables would indicate a cabling issue.
ERRDUMP
The errdump log should be reviewed for messages that indicate issues which can vary from CDR-xxxx and C2-xxxx, C3-xxxx messages indicating credit loss, to issues show excessive network login attempts to switch hardware issues.
FABRICLOGCheck the fabric log for signs of ports doing repeated Link Resets, ports going offline/online or repeated fabric rebuilds. QuarterlyRun Brocade SAN Health Report, see Brocade SAN Health ReportNetwork Advisor
Server Sizing and ConfigurationIBM Network Advisor Sizing Requirements SmallMedium Large
Number of Fabrics81624
Number of Domains2060120
Number of Switch Ports200050009000
Number of Device Ports50001000020000
Number of Access Gateways203040
Server CPUDual Core 2GHzQuad Core 2GHzQuad Core 2GHz
Server Memory6GB8GB12GB
Server Disk (OS)60GB80GB100GB
Server Disk (App/DB)100GB100GB100GB
Server Disk (Backup)100GB100GB100GB
Server Operating SystemWindows 2008 R2 64-bitWindows 2008 R2 64-bitWindows 2008 R2 64-bit
If further information is needed associated to server sizing and configuration, please see here ( http://www.brocade.com/downloads/documents/product_manuals/NetworkAdvisor/NetworkAdvisor_InstallGd_v1230.pdf Additional RequirementsWe want to do everything we can to eliminate issues in the SAN from impacting our management interface. Should the SAN experience an unexpected degradation or failure, we need to ensure our ability to access Network Advisor is unaffected. This ability could be severely compromised or lost if our main tools (OS, application) reside on the SAN. Therefore, the following points must be followed in performing a best practice installation of IBM Network Advisor server:
Dedicated / Stand-alone server
NOTE: A Virtualized server may be used, however it must follow same requirements as a dedicated/stand-alone server
The server must be dedicated for Network Advisor
No other applications installed/running
The server OS must not boot from SAN
Install OS on local disk (internal to server)
Network Advisor must not be installed on SAN
Install Network Advisor Server/DB on local disk (internal to server)
Server should be partitioned for three drives: one for the OS, one for the Application, and one for Backup Data
Backup Data needs to be on physically separate driveBrowser Requirements
Firefox under Windows
Oracle JRE 1.6.0 update 24 for Network Advisor and Web Tools
Server and Client Ports
The Management application has two parts: the Server and the Client. The Server is installed on one machine and stores device-related information; it does not have a user interface. To view information through a user interface, you must log in to the Server through a Client. The Server and Clients may reside on the same machine, or on separate machines.
In some cases, a network may utilize virtual private network (VPN) or firewall technology, which can prohibit communication between Switches and the Servers or Clients. In other words, a Server or Client can find a Switch which appears to log in, but is immediately logged out because the Switch cannot reach the Server or Client. To resolve this issue, check to determine if the ports in the table below need to be opened up in the firewall.
Port NumberPortsTransportDescriptionCommunication PathOpen in Firewall
201FTP Port (Control)TCPFTP Control port for internal FTP serverClient-ServerSwitch-ServerYes
211, 2FTP Port (Data)TCPFTP Data port for internal FTP serverClient-ServerSwitch-ServerYes
221SSH or Secure TelnetTCPSectelnet port from server to switch/client to switchServer-SwitchClient-SwitchYes
231TelnetTCPTelnet port from server/client to switchServer-SwitchClient-SwitchYes
25SMTP Server PortTCPSMTP Server port for Email communicationServer-SMTPServerYes
49TACACS+ Authentication portTCPTACACS+ server port for authentication if TACACS+ is chosen as an external authenticationServer-TACACS+ServerYes
80Jboss.web.http.portTCPNon-SSL HTTP/1.1 connector portClient-ServerYes
803, 4Switch httpTCPSwitch non-SSL http port for http and CAL communicationServer-SwitchClient-SwitchYes
1611SNMP PortUDPDefault SNMP PortServer-SwitchYes
1623Snmp.trap.portUDPDefault SNMP Trap PortSwitch-ServerYes
389LDAP Authentication Server PortTCPLDAP server port for authentication if LDAP is chosen as an external authenticationServer-LDAPServerYes
4433, 4, 5Switch httpsTCPSwitch SSL http port for https and CAL communicationServer-SwitchClient-SwitchYes
5146Syslog PortUDPDefault Syslog PortSwitch-ServerYes
636LDAP Authentication SSL PortUDPLDAP server port for authentication if LDAP is chosen as an external authentication and SSL is enabledServer-LDAPServerYes
10241, 7MPITCPMPI Trap recipient portSwitch-ServerYes
1812RADIUS Authentication Server PortTCPRADIUS server port for authentication if RADIUS is chosen as an external authenticationServer-RADIUSServerYes
20481, 9MPITCPMPI discovery NMRU portServer-SwitchYes
20491, 5, 7, 9MPITCPMPI discovery NMRU port for SSLServer-SwitchYes
26388Database port (Enforced during install)TCPPort used by databaseServer-DatabaseRemote-ODBC-DatabaseYes
Port NumberPortsTransportDescriptionCommunication PathOpen in Firewall
44301, 5, 7MPITCPXML-RCP port for SSLServer-SwitchYes
5988SMI Agent portTCPSMI Agent portSMI Agent-Server-ClientYes
5988SMI Agent port with SSL enabledTCPSMI Agent port with SSL enabledSMI Agent Server-ClientYes
80801, 7MPITCPXML-RCP port/HTTP portServer-SwitchYes
2460010Jboss.naming.jnp.port-port 0TCPBootstrap JNP service portClient-ServerYes
24601Jboss.connector.ejb3.port-port 1TCPEJB3 connector portClient-ServerYes
24602Jboss.connector.bisocket.port-port 2TCPBisocket connector portClient-ServerYes
24603Jboss.connector.bisocket.secondary.port-port 3TCPBisocket connector secondary portClient-ServerYes
246045Jboss.connector.sslbisocket.port-port 4TCPSSL Bisocket connector portClient-ServerYes
246055Jboss.connector.sslbisocket.secondary.port-port 5TCPSSL Bisocket connector secondary portClient-ServerYes
24606Smp.registry.port-port 6TCPRMI registry portClient-ServerYes
24607Smp.server.export.port-port 7TCPRMI export portClient-ServerYes
24608Smp.server.cliProxyListeningport-port 8TCPCLI proxy telnet portClient-ServerYes
24609Jboss.naming.rmi.port-port 9TCPRMI naming service portClient-ServerYes
24610Jboss.jrmp.invoker.port-port 10TCPRMI/JRMP invoker portClient-ServerYes
24611Jboss.pooled.invoker.port-port 11TCPPooled invoker portClient-ServerYes
24612Jboss.connector.socket.port-port 12TCPSocket invoker portServerNo
24613Jboss.web.ajp.port-port 13TCPAJP 1.3 connector portServerNo
24614Jboss.web.service.port-port 14TCPWeb service portServerNo
24615Connector.bind.port-port 15TCPPort to listen for requestsServerNo
32768-65535Ephemeral portsUDPEphemeral transport protocol portsSwitch-ServerYes
5555510Client Export PortTCPClient port to which server pushes the M-EOS device Element Manager updatesServer-ClientYes
55556Launch in Context (LIC) client hand shaking portTCPClient port used to check if a Management application client opened using LIC is running on the same host. NOTE: If this port is in use, the application uses the next available portClientNo
Notes to port superscripts:1 Port is not configurable (either in the switch or the Management server).
2 Every FTP session requires an additional port which is randomly picked. If the firewall is enabled then FTP operation (used for firmware download, technical support, firmware import (from client-server) and so on.) will fail.
3 Ports configurable in the switch and the Management server. Port must be the same for all switches managed by the Management server.
4 Ports used to launch the Web Tools application for Fabric OS switches from the Management client. This is applicable only when the Fabric OS version is earlier than 6.1.1.
5 Port used for SSL communication. If SSL is enabled, you must open 443*, 24604, and 24605 in the firewall. If SSL is not enabled, port 80* must be open in the firewall and 443*, 24604, and 24605 can be closed. An asterisk (*) denotes the default web server port number. If you set the web server port number to a port other than the default, you must open that port in the firewall.
6 The Syslog listening port is configurable in the Management server. The switch always sends syslog messages to port 514. If you have any other syslog daemon on the Management server machine already listening to 514, then the Management Server can be configured to listen to a different port. You must manually configure relay in existing syslogd to forward the syslog messages to the Management Server listening on the configured port.
7 Ports used for communicating with M-EOSn (M-i10K) directors. M-i10K always uses NMRU over SSL (2049). M-i10K always uses 8080 for http requests (firmware download, configuration backup/ restore, data collection). If M-EOSn firmware version is less than 9.1 the Management application uses 8080 for XML-RPC requests (discovery and asset collection). If the M-EOSn firmware version is more than 9.1 then it always uses SSL port (4430) for XML-RPC.
8 Port must be opened in firewall for the server when the remote ODBC client needs to talk to the Management database server (Only for EE). The same port is used by the Management server to database server (local). This is not used by the Management client.
9 Ports used for communicating with M-EOS (excluding M-i10K) switches (only required when the Management server manages M-EOS switches).
10 Port should be opened in firewall in the Management client to allow communication between server and client (only applicable for M-EOS switches). If this port is not opened in the firewall, then the M-EOS element manager does not receive updates. Also if multiple clients are opened, it will try to use the next available port (55556). So if there are n clients opened in the same machine then you must open 55555 (configurable) to 55555 + n ports in the firewall.
11 The Management server tries to find a contiguous block of 16 ports from the starting port configured (for example, 24600); if any port in this range is not available for the Management application, then you must provide a new starting port. Note that Port 1 to Port 15 in Ports column of the table above are not separately configurable and those ports vary based on the starting port number configuration (specified as Port 0 in the above table). The port numbers mentioned in the table above are the default ports (for example, when 24600 is selected as the starting port number).
Downloading IBM Network AdvisorThe following link may be used to access IBM Network Advisor software:
http://www-03.ibm.com/systems/networking/switches/san/b-type/na/index.html1. Under Learn more select IBM Network Advisor Trial web page
2. This will redirect you to the ibm.brocadeassist.com site
3. In the Product Downloads window, expand Brocade Network Advisor 11.1.x and select the current recommended version to download
Installing IBM Network Advisor
The following provides screenshot-by-screenshot guidance for an installation of the IBM Network Advisor (Enterprise edition).
1. Once youve downloaded the application, select the executable file and click install, this will bring up the Introduction screen...
2. Accept License...
3. Select Install Folder (Do Not install to the root directory, usually C:\)...
4. Note Pre-Install Summary and select Install...
5. Once installation is complete, click Done to complete the Network Advisor configuration...
6. IBM Network Advisor Configuration Welcome screen...
7. We are performing a new install, so will select No as we are not migrating any data or settings...
8. Select SAN with SMI Agent
9. You will need to have a Serial Number and License Key available at this point if you plan to perform a permanent install (these should have been provided when you purchased IBM Network Advisor). Otherwise, you can opt for a 75-day trial...
10. Enter required Serial and License Key...
11. As part of the Standard Deployment, we will select Internal FTP Server...
12. Add required information...
13. Most configurations will maintain the below defaults...
14. Most configurations will keep default. However, these settings can be changed later via the Server Management Console (in the Services tab) noted below.
15. Select the network size based on the scaling you used to size your server...
16. Verify your configuration...
17. At this point installation/configuration is complete and you are ready to start the client...
18. Server and Client startup...
19. Following initial login below, you will need to change the Administrator Password from the default. Once you have logged in you can perform this from Server > Users
Launching the Remote Client
To launch a remote client, complete the following steps:1. Open a web browser and enter the IP address of the Management application server in the Address bar. The Management application web start screen displays.
The web server port number default is 80. However, if SSL is enabled, this will be 443. You must enter the web server port number in addition to the IP address (e.g. IP_Address:Port_Number)
2. Click the Management application web start link.
The Log In dialog box displays.
3. Enter your user name and password.
The defaults are Administrator and password, respectively. If you migrated from a previous release, your username and password do not change.
4. Select or clear the Save password check box to choose whether you want the application to remember your password the next time you login.
5. Click Login.6. Click OK on the Login Banner dialog box. The Management application displays.User Account Management
Centralized authentication is IBM best practice in managing user accounts. Regardless of which method of authentication you use (Radius, TACACS+, LDAP, local) you will need to work with your security team to ensure you are meeting the account and IBM requirements.
ITCS104
The ITCS104 Technical Security Standards for SAN Switches may be found here.User Management
IBM Network Advisor provides a thorough role-based access control (RBAC) feature to define detailed roles and privileges for SAN administrators per the below. Provides current authentication and authorization configuration details
Consolidated list of user profiles, roles, and areas of responsibility (AOR)
Provisions to add, modify, duplicate a user profile, role, and AORs
Account State column shows active or lock out reasons
Access restricted to user assigned with User Management privilege with Read-Only/Read-Write permission.
No limit for number users added to Brocade Network Advisor. Number of users is dependent on the data base storage limit.
Local authentication (local password database), Windows domain login, LDAP, RADIUS, and TACACS+ are supported. Automatic failover to a secondary authentication method can be configured, in case a remote primary authentication method becomes unavailable. Privileges: Provide access to the features in Management application. Role: Group of selected privileges. A role can be assigned to one or more Management application users who need access to the same menu options. AOR (Areas of Responsibility): Used to define device access permission to a user. AORs have the ability to group fabrics, hosts, and other products. AORs can be modified, deleted, or duplicated.
Default and User-defined AccountsIn addition to the default accountsroot, factory, admin, and userFabric OS supports up to 252 additional user-defined accounts in each logical switch (domain). These accounts expand your ability to track account access and audit administrative activities. See the Fabric OS Administrators Guide below for in-depth detail on setting up these accounts.
NOTE: The default user accounts (root, factory, admin, and user) need to be properly secured. Change the default passwords for root and factory and keep these separate and secure. The root and factory accounts provide a level of access beyond the admin account.
Work with your security team in securing and managing the Root and Factory accounts
Work with your security team to define non-default Admin and User accounts with the same access for your users
Disable the default Admin and User accounts
AAA (Authentication, Authorization, and Accounting) Settings
The Authentication function enables you to configure an authentication server and establish authentication policies. Authentication is configured to the local database by default. If you configure primary authentication to a Radius server, a TACACS+ server, an LDAP server, or switch authentication, you can also configure secondary authentication to the local server. When you log in to the Management application, if the primary server is unavailable, the Management application attempts with the next configured primary server. If all primary servers are unavailable, the Management application falls back to the secondary authentication. Fall back can occur when the server is unavailable, authentication fails, or the user is not found.
Configuring authentication may be performed through the Network Advisor Server Management Console. See the Server Management Console section of the Network Advisor User Manual for details on setting up Radius, TACACS+, LDAP, etc. authentication methods.Server Management Console
The Server Management Console (Start > Programs > IBM Network Advisor 11.1.x > Server Management Console) may be used to restart services, change port settings, restore data, and upload technical support information. We will go through a few of these in the screenshots that follow...
From the Services tab, you can start, stop, refresh, and restart services on the server.
From the Ports tab, you can change the Management application server or web server port numbers.
From the AAA Settings tab, you can configure different authentication methods (LDAP or RADIUS, etc.), and establish authentication policies.
From the Restore tab, you can restore server application data. Application: Server > Options > Server Backup.
NOTE: The Restore Path is what you set above in the Server Data Backup section (E:\Backup).
From the Technical Support Information tab, you can collect information for technical support.
IBM Network Advisor Configuration Screen
Should you find that you need to change a configuration to one of the settings in the screen below, you may access via: Start > Programs > IBM Network Advisor 11.1.x > IBM Network Advisor Configuration
Backup and Restore Configuration Data
Switch Backup and RestoreSaving switch configurationsSave switch configuration is only supported on Fabric OS switches. To save switch configuration on more than one switch at a time, you must have the Enhanced Group Management license.
Configuration files are uploaded from the selected switches and stored in individual files. Files are named with the convention cfg_fabricName_switchName_domainID.
1. Select Configure > Configuration > Save. The Save Switch Configurations dialog box displays.
2. Select the switches for which you want to save configuration files from Available Switches.
3. Click the right arrow to move the selected switches to Selected Switches.
4. Click OK. Configuration files from the selected switches are saved to the repository.
Restoring a switch configuration for a selected device
The Restore Switch Configuration dialog box enables you to download a previously saved switch configuration to a selected device. Stored configurations are linked to the switch WWN; therefore, if the IP address or switch name is changed and then rediscovered, the Switch Configuration Repository dialog box displays the new switch name and IP address for the old configuration. If you delete a fabric or switch from discovery, the configuration remains in the repository until you delete it manually.
1. Right-click a device in the Product List or the Connectivity Map, and select Configuration >
Configuration Repository. The Switch Configuration Repository dialog box displays.
2. Select the configuration you want to restore, and click Restore.
The configuration is downloaded to the device. If necessary, the restoration process prompts you to disable and reboot the device before the configuration begins. This lets you determine whether the configuration backup should be performed immediately or at a later time. If you confirm the restoration, the entire configuration is restored; you cannot perform selective download for specific configuration sections.
Scheduling Switch BackupsThe Enhanced Group Management (EGM) license must be activated on a switch to perform this procedure and to use the supportSave module.
If a periodic backup is scheduled at the SAN level, that backup will apply to all switches from all discovered fabrics. Any new fabrics being discovered are automatically added to the list of fabrics to be backed up.
If a backup is scheduled for more than one fabric and some of the fabrics contain common members, the backup will include the unique switch configuration values obtained from the fabrics.
You can schedule a backup of one or more switch configurations. The configuration files are stored in the Management application database.
1. Right-click a device in the Product List or the Connectivity Map, and select Configure > Configuration >
Schedule Backup.
The Schedule Backup of Switch Configurations dialog box displays.
2. Click the Enable scheduled backup check box.
3. Set the Schedule parameters:
The desired Frequency for backup operations (select weekly)
Choose a day of the week when utilization is low (e.g. Sunday)
The Time (hour, minute) you want back up to run.
The maximum age allowed before you Purge Backups. The number of purge days should be at least one day more than the selected backup frequency.
The backup purge thread runs every day at 12:30 PM and deletes all back up configurations that exceed the maximum age allowed.
4. Choose one of the following options to determine the scope of the backup.
Select the Backup all fabrics check box, to back up all switch configurations of discovered switches in all fabrics
Clear the Backup all fabrics check box and select the specific fabric check boxes in the
Selected Fabrics table to back up individual fabrics.
If any switches do not have the EGM license, a messages displays. Click OK to enable backup on the switches with the EGM license.5. Click OK.
Server Data Backup and Restore
Network Advisor helps you protect your data by backing it up automatically. The data can then be restored as necessary. What is backed up?If we set our backup for the D:\ drive (or whatever the backup drive is) the following files/data will reside in D:\Backup, as follows:
Backup\databases contains database and log files.
Backup\data contains M-EOS switches Element Manager data files (including Dump files,
Data collection progress files, Director/Switch firmware files FAF files, Switch technical
SupportSave, and Switch backup files) and Fabric OS miscellaneous files.
Backup\conf contains the Management application configuration files.
Backup\cimom contains the SMIA configuration files.Configuring backup to a hard driveNOTE: This requires a hard drive. The drive should not be the same physical drive on which the Operating
System or the Management application is installed.
To configure the backup function to a hard drive, complete the following steps (screenshot below for reference).
1. Select Server > Options. The Options dialog box displays.
2. Select Server Backup in the Category list. The currently defined directory displays in the Backup Output Directory field.
3. Select the Enable Backup check box, if necessary.
4. Choose the following option:
Select the Include FTP Root directory check box.
In selecting the FTP Root directory, the FTP Root sub-directories, Technical Support and Trace Dump, are selected automatically and you cannot clear the sub-directory selections.
5. Enter the time (using a 24-hour clock) you want the backup process to begin in the Next Backup Start Time Hours and Minutes fields.
6. Select an interval from the Backup Interval drop-down list to set how often backup occurs.
7. Browse to the hard drive and directory to which you want to back up your data (this should be a separate physical drive).
8. Click Apply or OK.
The application verifies that the backup device exists and that the server can write to it. If the device does not exist or is not writable, an error message displays that states you have entered an invalid device. Click OK to go back to the Options dialog box and fix the error. Backup occurs, if needed, at the interval you specified.
Enabling backupBackup is enabled by default. However, if it has been disabled, complete the following steps to enable the function.
1. Select Server > Options.
The Options dialog box displays.
2. Select Server Backup in the Category list.
3. Select the Enable Backup check box.
4. Click Apply or OK.
Viewing the backup status
The Management application enables you to view the backup status at a glance by providing a backup status icon on the Status Bar. The following table illustrates and describes the icons that indicate the current status of the backup function.
Server Data Restore
This can be performed via the Restore tab in the Server Management Console section (below).
Event LogsYou can view all events that take place through the Master Log at the bottom of the main window. You can also view a specific log by selecting an option from: Monitor > Logs (submenu). These logs are described in the following list:
Audit Log. Displays all Application Events raised by the application modules and all Audit
Syslog messages from the switches and Brocade HBAs. Product Event Log. Displays all Product Event type events from all discovered switches and
Brocade HBAs. Fabric Log. (SAN only) Displays Product Events, Device Status, and Product Audit type events for all discovered fabrics. FICON Log. Displays all the RLIR and LRIR type events, for example, link incident type events. Product Status Log. (SAN only) Displays events which indicate a change in Switch Status for all discovered switches and Brocade HBAs. Security Log. Displays all security events for the discovered switches. Syslog Log. Displays syslog messages from switches and HBAs.Master LogThe Master Log, which displays in the lower left area of the main window, lists the events and alerts that have occurred on the SAN. If you do not see the Master Log, select View > Show Panels > All Panels or press F5.The following fields and columns are included in the Master Log: Severity. The severity of the event. When the same event (Warning or Error) occurs repeatedly, the Management application automatically eliminates the additional occurrences. Acknowledged. Whether the event is acknowledged or not. Select the check box to acknowledge the event. Source Name. The product on which the event occurred. Source Address. The IP address (IPv4 or IPv6 format) of the product on which the event occurred. Origin. The event source type (for example trap, pseudo-event, application, or syslog). Category. The type of event that occurred (for example, client/server communication events). Description. A description of the event. Last Event Server Time. The time and date the event last occurred on the server. Count. The number of times the event occurred. Module Name. The name of the module on which the event occurred. Message ID. The message ID of the event. Product Address. The IP address of the product on which the event originated. Contributor. The name of the contributor on which the event occurred. Node WWN. The world-wide name of the node on which the event occurred. Fabric Name. The name of the fabric on which the event occurred. Operational Status. The operational status (such as, unknown, healthy, marginal, or down) of the product on which the event occurred. First Event Product Time. The time and date the event first occurred on the product. Last Event Product Time. The time and date the event last occurred on the product. First Event Server Time. The time and date the event first occurred on the server. Audit. The audit of the event. Virtual Fabric ID. The VFID of the product on which the event occurred. Zone Alias. Displays the zone alias of the product or port.Collect SupportSave
To collect switch and Network Advisor supportsaves select the Monitor -> Technical Support
Network Advisor Supportsave
To collect a Network Advisor supportsave select Monitor->Technical Support->Supportsave
Supportsave Manual CollectionTo collect a switch supportsave select Monitor->Technical Support->Product / Host Supportsave, and select the Generate Now tab. From the panel on the left select the switches, or fabric which you want to collect supportsave files from and press the right arrow which will display the selected machine in the left hand panel. Once all the required switches are listed in the left panel press the OK push button to start the supportsave collection process.
A dialog box indicating the supportsave has started will be displayed.
Messages in the Master log will also indicate the start and completion of the support save.
Supportsave Scheduled CollectionTo collect a switch supportsave select Monitor->Technical Support->Product / Host Supportsave, and select the Generate Now tab. . From the panel on the left set the frequency to collect the supportsave files (weekly recommended) and the day of the week and time to collect the files (Sun evening is recommended). Select the switches, or fabric which you want to collect supportsave files from and press the right arrow which will display the selected machine in the left hand panel. Once all the required switches are listed in the left panel press the OK push button to start the supportsave collection process. Event notificationCall HomeNetwork Advisor supports call home to IBM Support. This will allow automatic creation of a problem record with IBM in response to significant error events on devices you are managing in your SAN. Additional information can be found at the following links:
Brocade Network Advisor User Manual This is a direct link to the Brocade User Manual Call Home section and provides in-depth instruction on how to configure
IBM Network Advisor Call Home Setup This link provides IBM-specific email addresses and phone numbers to use when configuring Call Home. You may need to consult with your security team to ensure your security model allows call home via email and/or phoneSNMP
As accounts may not have identical infrastructures, SNMP traps should be configured to be sent to the event capture and reporting tool deployed for each account. You will need to work with your SNMP Trap Collector (i.e. Netview, NetCool, etc.) administrator to ensure all alerts noted in the below sections are defined properly and are being received.
NOTE: Recommendation is to configure SNMP v3. If your capture tool does not support this, use SNMP v1 (If you need to use SNMP v1, do not use the defaultTrap enablement tasksConfiguring individual SNMP traps this must be done on a per switch basis within the Web Tools interface. Enable SNMP per the following on each of your Brocade products (switches, directors, etc.).
1. From Web Tools, click on Switch Admin > Show Advanced Mode
2. This will bring you to the following screen, select SNMP here
3. At the SNMPv3 Inform/Trap Recipient:
Select a User Name Provide an IP address for the Recipient IP Set Trap Level to 3-Warning level
Fabric Watch
Fabric Watch tracks a variety of SAN fabric elements and events. Monitoring fabric-wide events, ports, and, environmental parameters to enable early fault detection via SNMP.
Reasons to Implement Fabric Watch
IBM in general has not been manually monitoring for error conditions within our SAN environments to date and this has led to multiple customer impacts that could have been easily avoided.
Fabric Watch can be enabled and thresholds set to alert on these events for code level 6.3 and above.
Fabric Watch specific alerts to be enabled are documented below.
Fabric Watch should have been purchased with the switch (it is a FOS feature, and is included automatically with all Brocade SAN switches purchased from IBM).
When configuring Fabric Watch, the Fabric/Port Class and Alert Type/Threshold settings below should be followed:ClassAreaAlert TypeHigh BoundaryTimeAlert
SFPSTSFP State Change0Minutesraslog
FabricEDE_Ports Down0Minutesraslog,snmp
FCFabric Reconfigure0Minutesraslog,snmp
DCDomain ID Changes0Minutesraslog
SCSegmentation0Minutesraslog,snmp
ZCZone Changes10Minutesraslog
FLFabric Logins10Minutesraslog
E_PortSTState Change10Minutesraslog,snmp
PEProtocol Error5Minutesraslog
LRLink Reset2Minutesraslog,snmp
ITWInvalid Tx Words (enc_out)25Minutesraslog
CRCInvalid CRCs5Minutesraslog,snmp
C3TX_TOC3 Discards5Minutesraslog,snmp
RXRx Performance75%Minutesraslog
TXTx Performance75%Minutesraslog
FOP_Port (Fibre Optical Port)STState Change25Minutesraslog
PEProtocol Error5Minutesraslog
LRLink Reset25Minutesraslog,snmp
ITWInvalid Tx Words (enc_out)25Minutesraslog
CRCInvalid CRCs5Minutesraslog,snmp
C3TX_TOC3 Discards5Minutesraslog,snmp
RXRx Performance90%Minutesraslog
TXTx Performance90%Minutesraslog
Configuring Fabric Watch1. Login to Web Tools and open the Fabric Watch GUI:
2. Select the appropriate Class (F/FL Optical Port, E-Port, or Fabric) from the left screen pane:
3. From the Threshold Configuration tab at top, select Trait Configuration4. Enter Time Base and High Boundary (from the settings noted above in this document)
5. Select Custom Defined and ApplyThe example below will configure E_Ports to alert on CRC Errors which exceed 5 within 1 minute:
6. Select the Alarm Configuration tab
7. Select Above for ERROR_LOG, SNMP_TRAP (and EMAIL_ALERT if applicable). If email alerting is used you will need to provide an address via the Email Configuration tab (top right of screen in above example).
8. Select Custom Defined and Apply (this needs to be done for each alert)9. Once parameters for all alerts have been set, the same configuration may be replicated to other switches
From the interface: Configure > Configuration -> Replicate -> Configuration
Configuration Type > Partial FC > Fabric Watch:
Select Configuration from the Switch:
Select the switch for which you just configured all Fabric Class, E_Port, and F_Port Class settings:
Select the other switches in your fabric for which you want to enable Fabric Watch (using same settings):
Following the above screen you will be presented with Validation and Summary screens to complete the distribution of Fabric Watch settings.
Bottleneck Credit Tools
The bottleneck credit tool is used to automatically reset back end ports when loss of credits is
detected on the back end ports. This function was introduced in Brocade FOS v7.0.0 and
v6.4.2 and was further enhanced with improved credit loss detection in FOS v7.0.1b and
v6.4.3
Enabling bottleneck credit tools
Use the --cfgcredittools commands to enable or disable credit recovery of back-end ports, and use the --showcredittools parameter to display the configuration. When this feature is enabled, credit is recovered on back-end ports (ports connected to the core blade or core blade back-end ports) when credit loss is detected on these ports. If complete loss of credit on a Condor 2 back-end port causes frame timeouts, an LR is performed on that port regardless of the configured setting, even if that setting is -recover off.
When used with the -recover onLrOnly option, the recovery mechanism takes the following
escalating actions:
When the mechanism detects credit loss, it performs an LR and logs a RASlog message (CX-1014).
If the LR fails to recover the port, the port reinitializes. A RASlog message is generated (CX-1015). Note that the port reinitialization does not fault the blade.
If the port fails to reinitialize, the port is faulted. A RASlog message (RAS CX-1016) is generated.
If a port is faulted, and there are no more online back-end ports in the trunk, the port blade is faulted. A RASlog message (RAS CX-1017) is generated.Enable credit recovery tool with the LROnly option.
bottleneckmon --cfgcredittools -intport recover onLrOnlyBottleneck Detection
As transmission speeds within SAN fabrics continue to increase devices causing latency within the fabric have a larger impact on the overall health of the fabric. Devices causing latency have caused multiple customer impacts within IBM. Bottleneck Detection now provides a way to automatically watch for and alert upon high latency devices. This ability has already proven to shorten environment impact times within IBM operated environments from days to hours.Recommendations
Field experience shows that the original strategy of enabling Bottleneck Detection with conservative values for latency thresholds almost always yields no results. There was a concern that aggressive values would result in Bottleneck Detection alert storms, but this has not been the case. Even the most aggressive values result in relatively few alerts being generated. As a result, it is now recommended that the most aggressive settings are tried first and then backed off gradually if too many alerts are seen. Brocade 48000 should have no more than 100 ports monitored due to memory constraints
Congestion Threshold (-cthresh): Is new starting with code level 6.4. This monitors bandwidth utilization, the percentage of time that a link exceeds 95% utilization. The recommendation is to stay with the Brocade default value for this setting (80%). This means that if an individual link exceeds 95% utilization for 80+% of the measurement interval (the time specification= 30 seconds) an alert will be sent.
Latency Threshold (-lthresh): This is the minimum percent of time when a latency is detected (default is 20% or .2) This is the parameter we will adjust as we fine-tune BD Window: Specifies the measurement interval for measuring latency
Quiet Time: Specifies how often to send any tripped alertsSuggested Bottleneck Settings
FOS 6.3ParameterConservative SettingsNormal SettingsAggressive Settings
-time300605
-qtime300601
-thresh0.30.20.1
FOS 6.4ParameterConservative SettingNormal SettingsAggressive Settings
-time300605
-qtime300601
-lthresh0.30.20.1
-cthresh0.80.50.1
FOS 7.0
ParameterConservative SettingNormal SettingAggressive Setting
-time300605
-qtime300601
-lthresh0.30.20.1
-cthresh0.80.50.1
-lsubsectimethresh0.80.50.5 (no less)
-lsubsecsevthresh75501
ImplementationNOTE: The bottleneck detection feature detects latency bottlenecks only at the point of egress, not ingress
Enable Bottleneckmon via GUI1. Select Monitor > Performance > Bottlenecks.
The Bottlenecks dialog box displays.
2. Select Enable if it is not already selected.
3. Select the Alerts check box to enable alerts.
4. Use the below for your initial settings (see section below for additional tuning settings):
Congestion 50%
Latency 20%
Window 60 seconds
Quiet Time 60 seconds
5. Select Ports from the Products/Ports list. Select only F_ports.
6. Click the right arrow to apply the settings in the Bottleneck Detection pane to the selected
elements in the Products/Ports list.
7. Click OK or Apply to save your changes
8. See next section for tuning your initial settingsEnable Bottleneckmon via CLI
FOS 6.4 bottleneckmon --enable -lthresh 0.2 -cthresh 0.5 -time 60 -qtime 60 alertFOS 7.0
bottleneckmon --enable -lthresh 0.2 -cthresh 0.5 -time 60 -qtime 60 -lsubsectimethresh 0.5 -lsubsecsevthresh 50 -alertHow Bottlenecks are reported in Network Advisor
Bottlenecks are reported through alerts in the Master Log. A bottleneck cleared alert is sent when
the bottleneck is cleared.
NOTE: A bottleneck cleared alert is sent if you disable bottleneck detection on a bottlenecked port, even though the port is still bottlenecked.
Bottlenecks can be highlighted in the Connectivity Map and Product List. Select Monitor > Performance > View Bottlenecks. If a port is experiencing a bottleneck, a bottleneck icon is displayed in the Connectivity Map for the switch and fabric, and in the Product List for the port, switch, and fabrc. In the figure below, port15 and port22 are bottlenecked.
Port FencingReasons to Implement Port FencingAs transmission speeds within SAN fabrics continue to increase, devices causing latency within the fabric have a larger impact on the overall health of the fabric. The health of the fabric may degrade faster than an alert can be sent, received by the monitoring team, support tickets opened, and the required manual action to protect the fabric be taken.
Port Fencing provides a way to have the fabric respond to error-level thresholds by disabling port with high error rates. It sends an alert that this action has been taken so the steady state team can repair the situation and then bring the port back online.ImplementationNOTE: Port Fencing should only be done after the environment has successfully implemented Fabric Watch using the settings recommended in this guide. Healthy SAN fabrics are a prerequisite to implementation of Port Fencing. DO NOT implement Port Fencing unless the following criteria are met: The environment is running code level 7.0.2c or newer. . In code levels prior to 7.0.2c, the FW-1510 alert sent by the switch to inform administrators that Port Fencing has disabled ports is at an Informational severity level. This alert severity has been raised to Error in the 7.0.2c release.
The monitoring or steady state team has the cycles to monitor Informational SNMP alerts from the SAN switches.
A mature SNMP monitoring and response process must be in place prior to implementation of Port Fencing. Port Fencing is going to disable ports, a steady state team must receive these alerts and take action to fix the port and bring it back online. Failure to ever take action will result in future Client Impacting Events.Example: 1 of 2 SAN ports for a server exceeds the Port Fencing threshold and the port is automatically disabled by the SAN switch. The steady state team does not repair the port and bring it back online. A month later the remaining HBA in the server fails, now the server has no connectivity to back-end SAN storage devices.
When configuring Port Fencing within FOS v6.4.2a, the Violation Type and Threshold settings below should be followed:
E Port Class Area (note: the Time Base for all Alerts = 1 minute)
Violation Type ThresholdProtocol Error 10
Link Reset 10
Invalid Words (enc out) 60
Invalid CRCs 30
F Port Class Area (note: the Time Base for all Alerts = 1 minute)
Violation Type ThresholdProtocol Error 5
Link Reset 200
Invalid Words (enc out) 40
Invalid CRCs 20
C3 Discards (C3TX_TO) 40
Adding thresholds (Violation types):
1. To access Port Fencing select: Monitor > Fabric Watch > Port Fencing The Port Fencing dialog box displays:
2. Select C3 Discard Frames from Violation Type and click Add3. In the pop-up window, enter a Name, select Custom, enter Threshold, and Time (per parameters noted above)
Assigning thresholds to ports:
To assign an existing threshold to a port type, complete the following steps.
1. Select Monitor > Fabric Watch > Port FencingThe Port Fencing dialog box displays
2. Select a threshold type from the Violation Type list
3. Select the threshold you want to assign from the Thresholds table
4. Select the Port Type (E Port Class or F Port Class noted above), to which you want to assign the threshold from the Ports table. Do NOT assign a Port Type/Class to an incorrect Violation Type.
5. Click the right arrow
A directly assigned icon displays next to the objects you selected in the Ports table to show that the threshold was applied at this level.
An added icon appears next to every object in the tree to which the new threshold is applied.
6. Click OK on the Port Fencing dialog box.Unblocking a Port
Network Advisor allows you to unblock a port (only if it was blocked by Port Fencing) once the problem that triggered the threshold is fixed.When a port is blocked, and Attention icon displays next to the port node.
To unblock a port, complete the following steps.
1. Select Monitor > Fabric Watch > Port Fencing.
The Port Fencing dialog box displays.
2. Right-click anywhere in the Ports table and select Expand.
3. Select a blocked port from the Ports table.
4. Click Unblock.
5. Click OK on the message.
If you did not solve the root problem, the threshold will trigger again.
6. Click OK on the Port Fencing dialog box.Removing ThresholdsTo remove thresholds from the All Fabrics object, an individual Fabric, Chassis group, Switch, or
Switch Port, complete the following steps.
1. Select Monitor > Fabric Watch > Port Fencing.
The Port Fencing dialog box displays.
2. Select a threshold type from the Violation Type list.
3. Select the object with the threshold you want to remove in the Ports table.
4. Click the left arrow.Brocade Fabric VisionBrocade Fabric Vision is a collection of hardware and software functions in FOS 7.2 and Gen 5 Fiber Channel Switches. Fabric Vision consists of the following elements
MAPS Monitoring and Alerting Policy Suite
recommended see Monitoring and Alerting Policy Suite (MAPS) Bottleneck Detection
recommended see Bottleneck Detection Credit Loss Detection
recommended see Bottleneck Credit Tools Forward Error Correction
enabled on Gen 5 hardware switches Brocade ClearLink Diagnostics
for installation and diagnostic use Network Advisor Dashboards
recommended see Network Advisor Dashboards Flow Vision (includes Flow Monitoring, Flow Mirroring and Flow Generation)
for advanced PD onlySome Fabric Vision technology features are supported on Gen 4 b-type platforms; others are available only on Gen 5 Fibre Channel platforms with 16 Gbps performance capability. The chart below shows the various Fabric Vision technology features supported on each generation of products:
FeatureGen 4 PlatformsGen 5 Platforms
8 Gbps FC and associated capabilities16 Gbps FC and associated capabilities
Latency Bottleneck DetectionYesYes
Forward Error CorrectionNoYes
VC-level BB_Credit RecoveryNoYes
ClearLink Diagnostics (D_Port)NoYes
MAPSYesYes
Flow MonitoringYes, with some limitationsYes
Flow MirroringNoYes
Flow GeneratorNoYes
Monitoring and Alerting Policy Suite (MAPS)The Monitoring and Alerting Policy Suite (MAPS) is a storage area network (SAN) health monitor supported on all switches running Fabric OS 7.2.0 or later. This will replace Fabric Watch as the default health monitor once the FOS is at v7.2.0 or later. MAPS allows you to enable each switch to constantly monitor itself for potential faults and automatically alerts you to problems before they become failures.It is recommend setting up MAPs and not migrating the Fabric Watch settings, unless Fabric Watch was setup for a specific reason. See Initial MAPS setupMAPS Licensing Requirements and Software PrerequisitesSwitches with Fabric Watch and Advanced Performance Monitor licenses automatically get the Fabric Vision license features by upgrading to FOS v7.2Switches with only Fabric Watch or Advanced Performance Monitor can upgrade to Fabric Vision by purchasing other license (either Fabric Watch or Advanced Performance Monitor license).
MAPS Software Prerequisites:
FOS Version: v7.2.0d
IBM Network Advisor: 12.13 or higher.
NOTE: MAPS is the follow-on product to Fabric Watch, and while both require a license Fabric Watch customers can upgrade to MAPS without additional cost.If the switch currently has Fabric Watch setup and properly monitoring the fabric those Fabric Watch settings can be migrated to MAPS rules.
Differences between Fabric Watch and MAPS configurationsConfigurationFabric Watch behaviorMAPS behavior
End-to-End monitoring
(Performance Monitor class)SupportedSupported through flows.
Frame monitoring
(Performance Monitor class)SupportedSupported through flows.
RX, TX monitoringOccurs at the individual physical port
level.Occurs at the trunk or port level as applicable.
Pause/Continue behaviorOccurs at the element or counter level. For example, monitoring can be paused for CRC on one port and for ITW on another port.Occurs at the element level. Monitoring can be paused on a specific port, but not for a specific counter on that port.
CPU/Memory polling intervalCan configure the polling interval as well
as the repeat count.This configuration can be migrated from Fabric Watch, but cannot be changed.
E-mail notification
ConfigurationDifferent e-mail addresses can be configured for different classes.E-mail configuration supported globally.
Temperature sensor
MonitoringCan monitor temperature values.Can monitor only the states of the sensors (In_Range or Out_of_range).
Converting from Fabric Watch to MAPS
4. Backup the switch configuration using configupload5. Use the maspconfig fwconvert to convert Fabric Watch rules to MAPS.If Fabric Watch is currently in use this needs to be done before enabling MAPS, to preserve the Fabric Watch thresholds.Three new maps policies are create fw_active_policy based on the Fabric Watch settings currently active, fw_defaut_policy based on the Fabric Watch default settings and fw_custom_policy based on any Fabric Watch custom policies that were created.6. The conversion is one way, you cannot convert MAPS rules back to Fabric Watch
7. The first time you enable MAPS, using the command mapsconfig --enablemaps -policy fw_active_policy you will receive a warning (screenshot of the same given below).
8. Set allowable actions for rules using mapsconfig --actions raslog, snmp, email, sw_critical, sw_marginal, sfp_marginal
Make sure port fencing is not enabled / included in the mapsconfig command.Initial MAPS setupFor switches running FOS 7.2 or higher and that do not have Fabric Watch currently configured to monitor and alert for fabric events or if a clean MAPS setup is required use the following procedure
The recommended port monitoring strategy is to log marginal port events to the RAS log which should be reviewed on a regular bases, and to generate SNMP or email alerts for serious port events that need immediate attention.Note: To implement this policy you can simply import the IBM_SO policy see Importing MAPS configurationSince the MAPS default policies generate SNMP / email alerts for all of their port events the strategy is to copy the default policy as a base, but to replace the port rules with rules that implement the above strategy based on the settings defined for Fabric Watch.9. Create a copy of the MAPS default moderate policy as a base mapspolicy --clone dflt_moderate_policy -name IBM_SO10. Remove the port rules from the policy using the following commands.Note: Must be run from root for i in $(mapspolicy --show IBM_SO | grep defNON | awk '{print $1}'); do mapspolicy --delrule IBM_RTS -rulename $i; done for i in $(mapspolicy --show IBM_SO | grep E_PORTS | awk '{print $1}'); do mapspolicy --delrule IBM_RTS -rulename $i; done for i in $(mapspolicy --show IBM_SO | grep F_PORTS | awk '{print $1}'); do mapspolicy --delrule IBM_RTS -rulename $i; done for i in $(mapspolicy --show IBM_SO | grep T_PORTS | awk '{print $1}'); do mapspolicy --delrule IBM_RTS -rulename $i; done mapspolicy --delrule IBM_SO -rulename defSWITCHSEC_TS_D1011. Create new F-Port rules for the new IBM_SO policy mapsRule --create F_PORTS_PE_5 -group ALL_F_PORTS -timebase min -op g -policy IBM_SO -monitor PE -value 5 -action RASLOG mapsRule --create F_PORTS_ITW_25 -group ALL_F_PORTS -timebase min -op g -policy IBM_SO -monitor ITW -value 25 -action RASLOG mapsRule --create F_PORTS_CRC_5 -group ALL_F_PORTS -timebase min -op g -policy IBM_SO -monitor CRC -value 5 -action RASLOG,SNMP,EMAIL mapsRule --create F_PORTS_CRC_H25 -group ALL_F_PORTS -timebase hour -op g -policy IBM_SO -monitor CRC -value 25 -action RASLOG,SNMP,EMAIL mapsRule --create F_PORTS_LR_3 -group ALL_F_PORTS -timebase min -op g -policy IBM_SO -monitor LR -value 3 -action RASLOG,SNMP,EMAIL mapsRule --create F_PORTS_LR_H10 -group ALL_F_PORTS -timebase hour -op g -policy IBM_SO -monitor LR -value 10 -action RASLOG,SNMP,EMAIL mapsRule --create F_PORTS_C3TXTO_5 -group ALL_F_PORTS -timebase min -op g -policy IBM_SO -monitor C3TXTO -value 5 -action RASLOG,SNMP,EMAIL mapsRule --create F_PORTS_TX_90 -group ALL_F_PORTS -timebase min -op g -policy IBM_SO -monitor TX -value 90 -action RASLOG mapsRule --create F_PORTS_RX_90 -group ALL_F_PORTS -timebase min -op g -policy IBM_SO -monitor RX -value 90 -action RASLOG12. Create new E-Port rules for the new IBM_SO policy
mapsRule --create E_PORTS_PE_5 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor PE -value 5 -action RASLOG mapsRule --create E_PORTS_ITW_25 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor ITW -value 25 -action RASLOG mapsRule --create E_PORTS_CRC_5 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor CRC -value 5 -action RASLOG,SNMP,EMAIL mapsRule --create E_PORTS_CRC_H25 -group ALL_F_PORTS -timebase hour -op g -policy IBM_SO -monitor CRC -value 25 -action RASLOG,SNMP,EMAIL mapsRule --create E_PORTS_LR_3 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor LR -value 3 -action RASLOG,SNMP,EMAIL mapsRule --create E_PORTS_LR_H10 -group ALL_F_PORTS -timebase hour -op g -policy IBM_SO -monitor LR -value 10 -action RASLOG,SNMP,EMAIL mapsRule --create E_PORTS_ST_1 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor STATE_CHG -value 1 -action RASLOG,SNMP,EMAIL mapsRule --create E_PORTS_C3TXTO_5 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor C3TXTO -value 5 -action RASLOG,SNMP,EMAIL mapsRule --create E_PORTS_TX_75 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor TX -value 75 -action RASLOG mapsRule --create E_PORTS_RX_75 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor RX -value 75 -action RASLOG13. Enable the IBM_SO policy
mapsConfig --enablemaps -policy IBM_SO14. Set allowable actions
mapsconfig --action RASLOG,SW_CRITICAL,SW_MARGINAL,SW_HEALTHY,SFP_MARGINALImporting MAPS configuration
It is possible to import a MAPS policy and its rules instead of manually setting up MAPS as per the above section. There is an IBM_SO MAPS policy which is available and can be imported to enable setting up MAPS quickly.
Select the MAPS configure dialog by selecting Monitor->Fabric Vision->MAPS->Configure
Select the switch you want to import the MAPS policy into and select the IMPORT button.
Select the IBM_SO xml file
A progress message is displayed during the import.
A final status messages is displayed when the import is completed.
To activate the policy expand the list of policies for the switch, select the IBM_SO policy and press the Activate push button.
To enable the apropriate actions for the switch select the switch and press the Actions push button.
Typically all actions except Fence are enabled.
Replicating a policy to other devices
You can replicate a non-default policy on a device to all MAPS-capable devices in a Fabric or SAN.
NOTE: Copying a policy from one device to another overwrites any policy with a matching name on the target devices
Right-click a device in the Product List or Connectivity Map and select Fabric Vision > MAPS > Configure
The MAPS Configuration dialog box displays.
Select a non-default policy on a device (source) you want to replicate in the list and click Distribute.
The Distribution Options dialog box displays.
Set the destination by choosing one of the following options:
All fabric distribution Select to replicate the policy on all MAPS-capable devices in the SAN.
Within fabric distribution Select to replicate the policy on all MAPS-capable devices in the selected Fabric.
Set the activation parameters by choosing one of the following options:
Activate policy on each switch Select to immediately activate the policy on the target devices after distribution. If the selected policy is not an active policy, Activate after distribution activates the policy on the source device as well as the target devices.
Do not activate policy on each switch Select to not activate the policy on the target devices after distribution.
Click OK on the Distribution Options dialog box.
The selected policy is replicated on all MAPS-capable devices in the selected Fabric or SAN.
If you chose to activate the policy after distribution, the selected policy is activated the target devices and the source device, if necessary.
Click Close on the MAPS Configuration dialog box.
MAPS and Bottleneck Monitor
1 The MAPS dashboard mapsdb --show simplifies bottleneck event integration in FOX v7.2. Bottleneck events are reported in the summary section of the report output.
2 The MAPS dashboard is used only for logging bottleneck latency events. Congestion bottleneck events are not logged on the MAPS dashboard.
3 The MAPS dashboard will continue to log events whether RASLogs are set to on or off in the bottleneck configuration.
4 The MAPS dashboard history section updates its display of CRED_ZERO (measured in millions) and BN_SECS values at one minute interval.
4.1 BN_SECS indicates the total seconds that were marked as being affected by bottlenecks since the previous midnight.
Enable MAPS in Network Advisor1. Log In to NA. From the Monitor menu choose the Fabric Vision sub menu, select MAPS and Enable
Activate MAPS Policy from Network Advisor
Log In to NA. From the Monitor menu choose the Fabric Vision sub menu, select MAPS and Configure
Highlight the switch to be configured, select the dflt_moderate_policy or IBM_SO policy and click the Activate button.
Confirm dflt_moderate_policy or IBM_SO is now the active policy.
View the Parameters in a Policy
Log In to NA. From the Monitor menu choose the Fabric Vision sub menu, select MAPS and Configure
Highlight the switch with the policy to be viewed, select the policy and click the View button.
Choose the tab related to the parameter to be viewed (Port, Switch Status, Fabric, FRU, Security, Resource, FCIP, Traffic/Flows)
Network Advisor DashboardsThe below IBM Network Advisor Dashboard Widgets, Event Logs, and SAN Health are great tools for doing everything from quick assessments to in-depth investigation of the overall health of your SAN. Dashboard Tab- at a glanceThe Dashboard tab provides a high-level overview of the network and the current states of managed devices. This allows you to easily check the status of the devices on the network. The dashboard also provides several features to help you quickly access reports, device configuration, and system logs. The dashboard updates every 5 seconds regardless of the currently selected tab (SAN or Dashboard) or the SAN size. However, data may become momentarily out of sync between the dashboard and other areas of the application. For example, if you remove a product from the network while another user navigates from the dashboard to a more detailed view of the product, the product may not appear in the detailed view.
The Dashboard includes the following widgets:
1. SAN Operational Status. Displays the device status as a pie chart. Displays the device status as a percentage of the total number of devices. Displays the percentage in various colors on each slice. Displays the color legend below the pie chart. Displays tooltips on mouse-over to show the number of devices in that state. When there is one status category with less than one percent of the total number of devices, the status widget displays the number of devices in each category on each slice.2. SAN Inventory. Displays the SAN products inventory as stacked bar graphs. Displays each group as a separate bar on the graph. Displays the current state of all products discovered for a group in various colors on each bar. Displays the color legend below the y-axis. Displays tooltips on mouse-over to show the number of devices in that state.3. Events. Displays the number of events by severity level for a specified time range as a stacked bar graph. You can customize this widget to display a specific time range. Options include: This Hour, Last Hour, Last 24 Hours, Last 7 days, or Last 30 DaysBrocade SAN Health Report
SAN Health is a powerful (and free) utility from Brocade for surveying your SAN. SAN Health should be run at least on a monthly basis, doing so will help you recognize trends in your environment, as well as unknown current or potential issues. Performing and maintaining regular sets of SAN Health reports can also aid in troubleshooting, as they provide you with a detailed history of events taking place in your SAN.
You can download this utility and instructions for using it from Brocade at:
www.brocade.com/sanhealthBrocade SAN Health reports contain information such as the following: Fabric level information total port count, performance, oversubscription ratios, port utilization, and number of attached devices followed by specific information on each fabric, such as the connected switches, zoning configuration, and a port map. Switch level information such as licenses, port level configurations and ISL usage. Port level information such as bandwidth utilization, CRC counts and port status provides a snapshot on overall port health. Visio diagram shows the logical connection of the switches in the fabrics as well as the connected devices. ISLs, trunks and devices are shown exactly how they are connected to the switch ports. From this diagram, the fabric topology and other information can be viewed quickly and easily. Customized views of devices allow for online device identification, snapshot of performance stats and switch attachment details. Other items in this report include historical performance graphs plus guidelines and recommendations.NOTE: Past reports should be saved for trend and troubleshooting and planning purposes. These reports can be very helpful when trying to identify the source of an issue and should be readily available for Crit-Sit and Sev-1 types of situations. Instructions For Usage
1. Identify the name of the customer in the SAN Health .BSH upload file name
Good Example: James_Smith_120203_1201_ACMEcompany_LexingtonKY.BSH
Bad Example: James_Smith_130610_1454_LEX_FAB.BSH (we do not know what acct in LEX)
2. After uploading .BSH file to [email protected] send a follow-up email to Brocade alias [email protected] letting the Brocade Team know SH is coming. This will avoid duplicate efforts and allow faster response then sending it to individual members.
a. Include full file name(s) that were uploaded e.g. James_Smith_120203_1201_ACMEcompany_LexingtonKY.BSH
When sending any eMails to Brocade please ensure to include
Your name
Your eMail and phone number
Customer name
The geography the device/s will be (are) installed
Device Type / Model, and quantity
Account Focal e.g. SAN Architect, DPE, & etc. - name and contact information
Description on problems or why the request for SAN Health review,
b. any open PMR/SRs list the numbers
3. When configuring SH client http://www.brocade.com/services-support/drivers-downloads/san-health-diagnostics/download_san_health.page be sure to Include [email protected] see screen shot below:
4. Select option to create a separate Visio for each fabric: 5. Clear the stats on all Switches by doing a slotstatsclear and statsclear at least 24 hours prior to running SH report.
6. Set performance to capture minimum of two hours for graphs 7. Make sure you are using the latest client v3.2.6c download from http://www.brocade.com/services-support/drivers-downloads/san-health-diagnostics/download_san_health.page8. Follow-up SAN Health review request are to include status on all actions called out in previous review Brocade Recommendation Summary. Zoning
All zoning tasks must be performed from the Zoning dialog box in the Network Advisor application. You can access the Zoning dialog box from the main screen of the Management application using any of the following methods:
Select Configure > Zoning > Fabric.
Click the Zoning icon on the toolbar.
Right-click a port, a switch, a switch group, or fabric in the device list and select Zoning.
Right-click a port, a switch, a switch group, or fabric in the Connectivity Map and select Zoning.
NOTE: The following points need to be observed when performing zoning operations
Zoning via the CLI or Web Tools interface should never be performed due to the increased potential for catastrophic customer-impacting mistakes associated with these methods.
Single-Initiator Zoning should be used for all zoning. A single-initiator zone contains one HBA in a zone with target device/s.
Your default zoning mode should be set for No Access. This means unzoned devices cannot see each other and therefore requires a zone be established before they can communicateThe following is a procedure for zoning in a Brocade Fabric using IBM Network Advisor and will assure the following:
The current zone configuration in the fabric will be saved to the Network Advisor offline repository and can be restored to the fabric if necessary.
Multiple copies of the fabric zoning configuration will be stored in the offline repository. The number of copies will be dependent on your policy for cleaning out old zone DB copies in the offline repository.
The offline repository will be backed up as part of the scheduled Network Advisor backup when that backup occurs. There will be exposure to lost updates to the zoning DBs should the Network Advisor server become unavailable and have to be restored. The updates from the time of the last backup until the time the server is lost would be unrecoverable.
The current active Fabric Zone DB will always be the zoning DB used for updating when zoning changes are necessary in the fabric. The offline repository zone DBs will only be used for recovery if necessary.
The following will demonstrate the steps necessary to make changes to the current zone configuration and assure a copy of the current zone DB is stored to the offline repository as a fallback if necessary.
The current Fabric Zone DB consists of only 1 zone configuration.
A request has come in to add an additional zone to the fabric, we will add this zone as zone4. Updates to fabric zoning will always be made to the current active zone configuration in the Fabric Zone DB.
To assure that the Network Advisor zoning configuration window is current and assure you are viewing what is currently active in the fabric, perform a Zone DB Operation to refresh the DB. Verify the Zone DB listed is the Fabric Zone DB and perform a refresh.
Zone DB Operation Refresh (See below) You will receive a message indicating you are overwriting the selected zone DB with the one in the fabric, see below. Respond yes, this will guarantee your current view of the Fabric Zone DB is what exists in the fabric.
You will now want to save a copy of the current Fabric Zone DB to the offline repository so that you have a copy to fall back to if necessary. Zone DB Operation Save As (See below)
You will receive a window and need to input a Zone DB Name that will be used to identify the copy of the active Fabric Zone DB you are saving to the offline repository. You should establish a standard naming convention to be used and assure it is enforced. In this example we are using the initials of the person making the change followed by the date the change is being made followed by the name of the active zone configuration.
Once you respond OK to save you will be presented with the following screen. VERY important at this point to notice that the Zone DB listed below is the Zone DB you just saved to the offline repository: RJP_120610_SANWEST_X_CURRENT.
Now that you have saved your changes you can make your updates to the active Fabric Zone DB. VERY important to now go back into the Fabric Zone DB. The Fabric Zone DB is the Zone DB you always want to make your changes to. Zone DB Select Fabric Zone DB from the list (See below)
You will now see that the Fabric Zone DB is listed in the top middle of the screen. See below.
Now that you have saved a copy of the current active Fabric Zone DB to the offline repository and have assured you are again editing the active Fabric Zone DB you are ready to implement your change. For this example you will create a new zone, Zone 4, add it to the current active zone configuration and activate the zone configuration so it gets activated in the fabric. Create the new zone and name it Zone 4.New Zone Type Zone 4 as the name (See below)
Proceed to add the new members to the zone, see below.
Add the newly created Zone4 to the active configuration, see below.
Activate the zone configuration so that your changes are pushed to the fabric. You will be presented with a window that will display the changes you are getting ready to activate in the fabric. You need to VERIFY that these changes are correct and respond OK once you have completed the verification. Highlight current zone configuration Activate Respond OK once you have verified the intended changes are accurate (See below)
You will now see that Zone4 is active in your fabric zone configuration, see below.
Should you realize a mistake was made, you can fall back to the zone configuration that you saved to the offline repository.Zone DB Select the Zone DB you wish to activate from the list, in this example it is RJP_120610_SANWEST_X_CURRENT (See below)
You will now see the Zone DB that you want to fall back to listed. Review the Zone Configuration to assure it is the version you wish to fall back to. Notice the yellow triangle in the Active Zone Configuration tab below. This is a warning to tell you that there is a difference between what is currently active in the fabric and the Zone DB that you are editing in Network Advisor.
Once you have verified that the fall back Zone Configuration is correct then proceed to activate. Highlight the Zone Configuration you wish to activate Click Activate (See below)
You will see a new window displaying the changes to the fabric that will be implemented. After you verify this is accurate, click OK and the changes will be activated in the fabric. You will need to reply YES to a verification window that comes up in order to activate the new configuration.
You will now see the active configuration no longer displays Zone 4. This is the state you were in prior to making changes to add Zone4 to the configuration. The current Zone DB listed is the copy you saved in the offline repository.
You will want to refresh this screen by selecting the Fabric Zone DB to show what is currently active in the fabric. Zone DB Select the Fabric Zone DB (See below)
You will now see the Fabric Zone DB displayed, is showing you what is active in the fabric. You have successfully fallen back to the point you were at prior to beginning the changes.
As part of the procedure, you will be saving many copies of the Zone DB to the offline repository. You will want to establish a policy for cleaning up the offline repository. You should determine the number of copies to save and clean out older copies as necessary. To delete unwanted copies of the Zone DBs from the offline repository select the Zone DB you wish to delete.Zone DB Select the Fabric Zone DB to be deleted from the list, in this example RJP_120210_SANWEST_X_CURRENT (See below)
You will now see the Zone DB RJP_120210_SANWEST_X_CURRENT listed in the Zone DB field.
You can now delete this Zone DB.Zone DB Operation Select Delete from the list (See below)
You will now receive a window indicating you are removing this Zone DB from the offline repository. Respond Yes to remove it from the offline repository.
Conclusion
This document was designed provide guidance deploying IBM Network Advisor per IBM Best Practices. Additionally, guidance for maintenance, monitoring, and performance has been included. This guide is not intended to replace any of the current documentation that IBM and Brocade have released in support of this product.References
Below are links to references found in this document in addition to Network Advisor-specific links at Brocade and IBM.
Brocade Network Advisor SAN User Manual All the features, and their usage, in Network Advisor are described here
IBM Network Advisor Software Link to IBM Network Advisor overv