IBM Network Advisor Best Practices and Deployment Guide_v3.10

IBM Network Advisor Deployment and Best Practices Guide

IBM Network Advisor Deployment and Best Practice GuideVersion: 3.10Owner: Jim OlsonAuthor: Eric Block, David Lutz & Sudharsan S Vangal

http://ibm.biz/brocdesignbpTable of Contents

2Table of Contents

5Document history

5Document Location

6Approvals

6Distribution

8Introduction

8When to use Network Advisor

9Best Practices Recommendations

10Regular Tasks for SAN Health

10Daily

10Weekly

11Monthly

12Quarterly

13Network Advisor

13Server Sizing and Configuration

14Server and Client Ports

16Downloading IBM Network Advisor

18Installing IBM Network Advisor

27Launching the Remote Client

28User Account Management

29Server Management Console

32IBM Network Advisor Configuration Screen

34Backup and Restore Configuration Data

34Switch Backup and Restore

35Restoring a switch configuration for a selected device

35Scheduling Switch Backups

37Server Data Backup and Restore

39Viewing the backup status

39Server Data Restore

40Event Logs

41Collect SupportSave

41Network Advisor Supportsave

42Supportsave Manual Collection

43Supportsave Scheduled Collection

44Event notification

44Call Home

44SNMP

46Fabric Watch

46Reasons to Implement Fabric Watch

46Configuring Fabric Watch

52Bottleneck Credit Tools

52Enabling bottleneck credit tools

53Bottleneck Detection

53Recommendations

53Suggested Bottleneck Settings

53FOS 6.3

53FOS 6.4

54FOS 7.0

54Implementation

54Enable Bottleneckmon via GUI

54Enable Bottleneckmon via CLI

55How Bottlenecks are reported in Network Advisor

56Port Fencing

56Implementation

57Adding thresholds (Violation types):

57Assigning thresholds to ports:

58Unblocking a Port

58Removing Thresholds

60Brocade Fabric Vision

61Monitoring and Alerting Policy Suite (MAPS)

61MAPS Licensing Requirements and Software Prerequisites

61Differences between Fabric Watch and MAPS configurations

62Converting from Fabric Watch to MAPS

62Initial MAPS setup

63Importing MAPS configuration

66Replicating a policy to other devices

68MAPS and Bottleneck Monitor

68Enable MAPS in Network Advisor

69Activate MAPS Policy from Network Advisor

70View the Parameters in a Policy

72Network Advisor Dashboards

73Brocade SAN Health Report

73Instructions For Usage

76Zoning

93Conclusion

94References

Document history

Document Location

The source of the document can be found in the Team Room, located at:

Database Name:TBD Server Name:

TBD File Name:

IBM Network Advisor Deployment Guide V3.03.doc Please address any questions to: Revision History

Date of this revision: 01/16/2014Date of next revision: TBD

Revision NumberRevision DateSummary of ChangesChanges marked

1.06/11/12 Initial document creationNo

1.1

1.27/10/12Revised to meet requirements for standardized deploymentNo

1.37/25/12Added SNMP and Performance informationNo

1.47/29/12Added Zoning InformationNo

1.58/1/12Edited document for added emphasis to key points, as well as alteration to technical terms, per Art ScrimoNo

1.69/14/12Added SAN Health information to Health Check section. Added information to Fault Management and SNMP section. Added Event Logs section. Added Switch Backup and Restore.No

1.79/24/12Removed Linux from Network Advisor server optionsNo

1.89/25/12Incorporated Best Practices into guideNo

1.910/1/12Expanded on SAN Health Tool sectionNo

2.010/4/12Moved Security and Authentication to SAN Design GuideNo

2.110/24/12Added User Account Management sectionNo

2.210/30/12Removed duplicate switch recovery information. Edited overall content for flow/clarityNo

2.311/6/12Added links for navigating document more efficientlyNo

2.411/15/12Added Reference section. Edited Bottleneck and Port Fencing sections for Network Advisor (vs CLI)No

2.512/19/12Added information for SNMPv3, Call Home, Automatic Trace DumpsNo

2.61/14/13Minor edits to wordingNo

2.73/15/13Updated Port Fencing information based on alert severity changing in FOS v7.0.2c (per John Juenemann 20130313 Update Initiative)No

2.87/17/13Added additional detail/instruction for SAN Health usageNo

2.901/14/14Per Jim Olson and Kirby Dahman, changed Fabric Watch F_port Class thresholds to 25 for two alerts Link Reset and State ChangeNo

2.91

01/16/14Modified appearance of Fabric Watch alerts table for better clarity/detail (no FW values changed)No

3.005/28/14Added a new section for Flow Vision MAPS. Pages 50 54 (Updated as per Jim Olson's directive to include Fabric Vision)No

3.106/01/14Added section for Fabric Vision introduction. Added table for MAPS Threshold ValuesNo

3.206/02/14Corrected MAPS implementation section for more clarity. No

3.306/03/14Added Moderate Policy also for the MAPS Threshold values. Corrected FOS version requirement for MAPS.No

3.406/03/14Added section for Replicating Policies to Other devices as per Tron's request. No

3.506/05/14Corrected the Threshold policies. Added configuration screenshots for INANo

3.606/17/14Provided more clarity on MAPS advantages, and features over Fabric WatchYes

3.707/14/2014Included updates provided by David Lutz on the Fabric Watch. Yes

3.808/15/2014Restructured document and created new section for recommendations. MAPS section revised to show recommended MAPS implementation, Fabric Vision section update to provide better clarity. Section added to for collection supportsave files from Network Advisor, and updated the SAN regular tasks.Yes

3.1011/10/2014Added link to server configuration section for more detailYes

Approvals

This document requires following approvals:

NameTitle

Jim OlsonDistinguished Engineer

Distribution

This document has been distributed to:

NameTitle

Jim OlsonDistinguished Engineer

Ann CorraoDistinguished Engineer

John Juenemann Senior Technical Staff Member (STSM)

Karen Haberli Program Manager

Eric BlockStorage Architect

Sudharsan S VangalStorage Administrator

IntroductionThe purpose of this document is to present a set of guidelines that incorporate IBM best practices for deploying IBM Network Advisor (a.k.a. Brocade Network Advisor). This guide should act as a reference point in establishing consistent, standard deployments across IBM environments.

The best practices noted in this guide present some the more advanced features of Brocade Fabric OS (FOS) for example, Fabric Watch, Bottleneck Detection, and Port Fencing. Additional best practices are provided for hardware selection, zoning, and performing scheduled health-related checks and tasks in the SAN.

The guidance found in this document should provide you with an efficient, economic, and effective process by which to deploy and begin managing IBM Network Advisor.

NOTE: All deployments should be done using the Enterprise version of IBM Network Advisor.When to use Network Advisor

All SAN Fabric installations using Brocade technology should deploy IBM Network Advisor

If you are currently managing your Brocade SAN with DCFM, you should upgrade to Network Advisor per the following:

All 16Gb installations (or prior to upgrading to 16Gb)

Prior to upgrading any Brocade FOS product to level 7.x or aboveNOTE: DCFM is not qualified or supported for management of switches operating with FOS v7.0 and later firmware versions. You must first upgrade DCFM to Network Advisor 11.1 or later if you are planning to upgrade devices to FOS v7.0 or you risk losing management connectivity.Best Practices Recommendations

The following recommendations are based on best practice recommendations from Brocade and IBM technical support groups.

Install and use Network Advisor to manage all switches. See Network Advisor Setup Switch configuration backup. See Backup and Restore Configuration Data Enable Bottleneck Credit Recovery Tools. See Bottleneck Credit Tools Configure Call Home and SNMP or email event notification. See Event notification Switches running FOS 7.2 or higher setup MAPS. See Monitoring and Alerting Policy Suite (MAPS) Switches running FOS 7.1 or lower setup Fabric Watch. See Fabric Watch Configure and enable Bottleneck Detection. See Bottleneck Detection Configure Network Advisor Dashboards. See Network Advisor Dashboards Implement and follow regular SAN health tasks. See Regular Tasks for SAN HealthRegular Tasks for SAN Health

NOTE: The below should be considered mandatory tasks to be performed in any Brocade SAN environment. Consistent execution of these tasks will help to ensure your fabrics are operating optimally, and that you have adequate backup data available for unexpected impacts to the SAN. Additionally, performing these tasks will provide you with information which can be extremely useful in recognizing trends and also targeting sources of problems in assisting with the troubleshooting process.Daily

Review of Event LogsThe Master Log should be reviewed daily by the operations team as part of the health check process. Network Advisors Master Log lists all events and alerts that have occurred in the SAN and you should make it a habit of reviewing this log on a daily basis.

View specific logs by selecting an option from the Monitor menus Logs submenu. The following logs can be found here: Audit Log, Product Event Log, Fabric Log, FICON Log, Product Status Log, Security Log, Syslog Log. Fabric Watch, MAPS, Bottleneck Detection, and Port Fencing alerts will process like other alerts in the environment. They can be found in the IBM Network Advisor Master Log.

Weekly

Backup Switches Collect a set of configuration files in case they are required to restore the switch configuration. See Switch Backup and Restore section for how to do thisCollect Supportsaves

Collect a complete set of supportsave files from all switches before clearing the switch counters.

This will provide a set of switch logs from before the counters were cleared in case they are required for PD.

Provides a set of switch logs which can be used a baseline.

See Supportsave Scheduled Collection.Clear Switch Counters Counters that are never cleared are hard to troubleshoot, and you have no frame of reference for when the error counters on ports actually increased.

For this reason the Brocade best practice is to clear the counters on a known schedule, so that error counters seen are known to represent recent issues.

NOTE: Any time new devices are added to the SAN or cabling changes are made, it is common for ports to detect error. These errors should be cleared any time fabric changes are made.Action Automate a counter clear on all switches that runs on Sunday evening (suggest 6PM local time). You want this to happen after all the normally scheduled weekend changes should be complete and prior to production Sunday night / Monday morning workloads beginning to hit the production system.

Commands to be run:

Statsclear

Slotstatsclear

Monthly Review switch logs for marginal links or other potential switch issues.The following metrics are some of the key metrics when reviewing supportsave files.

PORTERRSHOWc3timeout / disc c3Frame discards are caused because frames are sitting in the frame buffers too long indicating that there are issues sending the frames.

Note: On older switch code levels a portstatsshow for any port with C3 discards may be required to determine if the discards are tx vs rxTx discards are frames that cannot be sent to the attached device, check for link issues then check the attached device

rx discards are frames that cannot be sent to the next hop in the switch. Check to see if other ports on the switch have tx discards.Check using framelog command to determine destination for rx frame discards.

crc_errThis counter is incremented when a frame with bad crc passes through the port.Need to determine where the source of the crc error occurred by check other ports and another switches for crc g_eof errors.

crc g_eofThis counter is incremented when a frame is detected was a crc error. This is the first port to detect the crc error.Typically caused by an optical issue often cables. Check cables, possible replace or swap the cables.Replace optics (HBA, SFP) on the attached device.

too shrttoo longbad eofIndication of frame errors.Typically caused by an optical issue often cables. Check cables, possible replace or swap the cables.

loss syncloss signLoss of sync and loss of signal typically occur when the optical link cycles usually at the attached device.Typically no actions are required unless counts are extremely high or occur during unexpected times.

SFPSHOWThe primary metric is Rx power which shows the amount of light the SFP is receiving.

Typically SFPs transmit around -2 to -3db (630 to 400uwatt) so for short distance cables receive power levels should be similar. Longer cables lengths will result in lower receive light levels and is not consider an issue. In general receive levels should not drop below -10db (100uwatt) unless its an extremely long cable run.

In general you should compare light levels to other cable runs of similar length and if you have noticeably lower levels compare to the other cables would indicate a cabling issue.

ERRDUMP

The errdump log should be reviewed for messages that indicate issues which can vary from CDR-xxxx and C2-xxxx, C3-xxxx messages indicating credit loss, to issues show excessive network login attempts to switch hardware issues.

FABRICLOGCheck the fabric log for signs of ports doing repeated Link Resets, ports going offline/online or repeated fabric rebuilds. QuarterlyRun Brocade SAN Health Report, see Brocade SAN Health ReportNetwork Advisor

Server Sizing and ConfigurationIBM Network Advisor Sizing Requirements SmallMedium Large

Number of Fabrics81624

Number of Domains2060120

Number of Switch Ports200050009000

Number of Device Ports50001000020000

Number of Access Gateways203040

Server CPUDual Core 2GHzQuad Core 2GHzQuad Core 2GHz

Server Memory6GB8GB12GB

Server Disk (OS)60GB80GB100GB

Server Disk (App/DB)100GB100GB100GB

Server Disk (Backup)100GB100GB100GB

Server Operating SystemWindows 2008 R2 64-bitWindows 2008 R2 64-bitWindows 2008 R2 64-bit

If further information is needed associated to server sizing and configuration, please see here ( http://www.brocade.com/downloads/documents/product_manuals/NetworkAdvisor/NetworkAdvisor_InstallGd_v1230.pdf Additional RequirementsWe want to do everything we can to eliminate issues in the SAN from impacting our management interface. Should the SAN experience an unexpected degradation or failure, we need to ensure our ability to access Network Advisor is unaffected. This ability could be severely compromised or lost if our main tools (OS, application) reside on the SAN. Therefore, the following points must be followed in performing a best practice installation of IBM Network Advisor server:

Dedicated / Stand-alone server

NOTE: A Virtualized server may be used, however it must follow same requirements as a dedicated/stand-alone server

The server must be dedicated for Network Advisor

No other applications installed/running

The server OS must not boot from SAN

Install OS on local disk (internal to server)

Network Advisor must not be installed on SAN

Install Network Advisor Server/DB on local disk (internal to server)

Server should be partitioned for three drives: one for the OS, one for the Application, and one for Backup Data

Backup Data needs to be on physically separate driveBrowser Requirements

Firefox under Windows

Oracle JRE 1.6.0 update 24 for Network Advisor and Web Tools

Server and Client Ports

The Management application has two parts: the Server and the Client. The Server is installed on one machine and stores device-related information; it does not have a user interface. To view information through a user interface, you must log in to the Server through a Client. The Server and Clients may reside on the same machine, or on separate machines.

In some cases, a network may utilize virtual private network (VPN) or firewall technology, which can prohibit communication between Switches and the Servers or Clients. In other words, a Server or Client can find a Switch which appears to log in, but is immediately logged out because the Switch cannot reach the Server or Client. To resolve this issue, check to determine if the ports in the table below need to be opened up in the firewall.

Port NumberPortsTransportDescriptionCommunication PathOpen in Firewall

201FTP Port (Control)TCPFTP Control port for internal FTP serverClient-ServerSwitch-ServerYes

211, 2FTP Port (Data)TCPFTP Data port for internal FTP serverClient-ServerSwitch-ServerYes

221SSH or Secure TelnetTCPSectelnet port from server to switch/client to switchServer-SwitchClient-SwitchYes

231TelnetTCPTelnet port from server/client to switchServer-SwitchClient-SwitchYes

25SMTP Server PortTCPSMTP Server port for Email communicationServer-SMTPServerYes

49TACACS+ Authentication portTCPTACACS+ server port for authentication if TACACS+ is chosen as an external authenticationServer-TACACS+ServerYes

80Jboss.web.http.portTCPNon-SSL HTTP/1.1 connector portClient-ServerYes

803, 4Switch httpTCPSwitch non-SSL http port for http and CAL communicationServer-SwitchClient-SwitchYes

1611SNMP PortUDPDefault SNMP PortServer-SwitchYes

1623Snmp.trap.portUDPDefault SNMP Trap PortSwitch-ServerYes

389LDAP Authentication Server PortTCPLDAP server port for authentication if LDAP is chosen as an external authenticationServer-LDAPServerYes

4433, 4, 5Switch httpsTCPSwitch SSL http port for https and CAL communicationServer-SwitchClient-SwitchYes

5146Syslog PortUDPDefault Syslog PortSwitch-ServerYes

636LDAP Authentication SSL PortUDPLDAP server port for authentication if LDAP is chosen as an external authentication and SSL is enabledServer-LDAPServerYes

10241, 7MPITCPMPI Trap recipient portSwitch-ServerYes

1812RADIUS Authentication Server PortTCPRADIUS server port for authentication if RADIUS is chosen as an external authenticationServer-RADIUSServerYes

20481, 9MPITCPMPI discovery NMRU portServer-SwitchYes

20491, 5, 7, 9MPITCPMPI discovery NMRU port for SSLServer-SwitchYes

26388Database port (Enforced during install)TCPPort used by databaseServer-DatabaseRemote-ODBC-DatabaseYes

Port NumberPortsTransportDescriptionCommunication PathOpen in Firewall

44301, 5, 7MPITCPXML-RCP port for SSLServer-SwitchYes

5988SMI Agent portTCPSMI Agent portSMI Agent-Server-ClientYes

5988SMI Agent port with SSL enabledTCPSMI Agent port with SSL enabledSMI Agent Server-ClientYes

80801, 7MPITCPXML-RCP port/HTTP portServer-SwitchYes

2460010Jboss.naming.jnp.port-port 0TCPBootstrap JNP service portClient-ServerYes

24601Jboss.connector.ejb3.port-port 1TCPEJB3 connector portClient-ServerYes

24602Jboss.connector.bisocket.port-port 2TCPBisocket connector portClient-ServerYes

24603Jboss.connector.bisocket.secondary.port-port 3TCPBisocket connector secondary portClient-ServerYes

246045Jboss.connector.sslbisocket.port-port 4TCPSSL Bisocket connector portClient-ServerYes

246055Jboss.connector.sslbisocket.secondary.port-port 5TCPSSL Bisocket connector secondary portClient-ServerYes

24606Smp.registry.port-port 6TCPRMI registry portClient-ServerYes

24607Smp.server.export.port-port 7TCPRMI export portClient-ServerYes

24608Smp.server.cliProxyListeningport-port 8TCPCLI proxy telnet portClient-ServerYes

24609Jboss.naming.rmi.port-port 9TCPRMI naming service portClient-ServerYes

24610Jboss.jrmp.invoker.port-port 10TCPRMI/JRMP invoker portClient-ServerYes

24611Jboss.pooled.invoker.port-port 11TCPPooled invoker portClient-ServerYes

24612Jboss.connector.socket.port-port 12TCPSocket invoker portServerNo

24613Jboss.web.ajp.port-port 13TCPAJP 1.3 connector portServerNo

24614Jboss.web.service.port-port 14TCPWeb service portServerNo

24615Connector.bind.port-port 15TCPPort to listen for requestsServerNo

32768-65535Ephemeral portsUDPEphemeral transport protocol portsSwitch-ServerYes

5555510Client Export PortTCPClient port to which server pushes the M-EOS device Element Manager updatesServer-ClientYes

55556Launch in Context (LIC) client hand shaking portTCPClient port used to check if a Management application client opened using LIC is running on the same host. NOTE: If this port is in use, the application uses the next available portClientNo

Notes to port superscripts:1 Port is not configurable (either in the switch or the Management server).

2 Every FTP session requires an additional port which is randomly picked. If the firewall is enabled then FTP operation (used for firmware download, technical support, firmware import (from client-server) and so on.) will fail.

3 Ports configurable in the switch and the Management server. Port must be the same for all switches managed by the Management server.

4 Ports used to launch the Web Tools application for Fabric OS switches from the Management client. This is applicable only when the Fabric OS version is earlier than 6.1.1.

5 Port used for SSL communication. If SSL is enabled, you must open 443*, 24604, and 24605 in the firewall. If SSL is not enabled, port 80* must be open in the firewall and 443*, 24604, and 24605 can be closed. An asterisk (*) denotes the default web server port number. If you set the web server port number to a port other than the default, you must open that port in the firewall.

6 The Syslog listening port is configurable in the Management server. The switch always sends syslog messages to port 514. If you have any other syslog daemon on the Management server machine already listening to 514, then the Management Server can be configured to listen to a different port. You must manually configure relay in existing syslogd to forward the syslog messages to the Management Server listening on the configured port.

7 Ports used for communicating with M-EOSn (M-i10K) directors. M-i10K always uses NMRU over SSL (2049). M-i10K always uses 8080 for http requests (firmware download, configuration backup/ restore, data collection). If M-EOSn firmware version is less than 9.1 the Management application uses 8080 for XML-RPC requests (discovery and asset collection). If the M-EOSn firmware version is more than 9.1 then it always uses SSL port (4430) for XML-RPC.

8 Port must be opened in firewall for the server when the remote ODBC client needs to talk to the Management database server (Only for EE). The same port is used by the Management server to database server (local). This is not used by the Management client.

9 Ports used for communicating with M-EOS (excluding M-i10K) switches (only required when the Management server manages M-EOS switches).

10 Port should be opened in firewall in the Management client to allow communication between server and client (only applicable for M-EOS switches). If this port is not opened in the firewall, then the M-EOS element manager does not receive updates. Also if multiple clients are opened, it will try to use the next available port (55556). So if there are n clients opened in the same machine then you must open 55555 (configurable) to 55555 + n ports in the firewall.

11 The Management server tries to find a contiguous block of 16 ports from the starting port configured (for example, 24600); if any port in this range is not available for the Management application, then you must provide a new starting port. Note that Port 1 to Port 15 in Ports column of the table above are not separately configurable and those ports vary based on the starting port number configuration (specified as Port 0 in the above table). The port numbers mentioned in the table above are the default ports (for example, when 24600 is selected as the starting port number).

Downloading IBM Network AdvisorThe following link may be used to access IBM Network Advisor software:

http://www-03.ibm.com/systems/networking/switches/san/b-type/na/index.html1. Under Learn more select IBM Network Advisor Trial web page

2. This will redirect you to the ibm.brocadeassist.com site

3. In the Product Downloads window, expand Brocade Network Advisor 11.1.x and select the current recommended version to download

Installing IBM Network Advisor

The following provides screenshot-by-screenshot guidance for an installation of the IBM Network Advisor (Enterprise edition).

1. Once youve downloaded the application, select the executable file and click install, this will bring up the Introduction screen...

2. Accept License...

3. Select Install Folder (Do Not install to the root directory, usually C:\)...

4. Note Pre-Install Summary and select Install...

5. Once installation is complete, click Done to complete the Network Advisor configuration...

6. IBM Network Advisor Configuration Welcome screen...

7. We are performing a new install, so will select No as we are not migrating any data or settings...

8. Select SAN with SMI Agent

9. You will need to have a Serial Number and License Key available at this point if you plan to perform a permanent install (these should have been provided when you purchased IBM Network Advisor). Otherwise, you can opt for a 75-day trial...

10. Enter required Serial and License Key...

11. As part of the Standard Deployment, we will select Internal FTP Server...

12. Add required information...

13. Most configurations will maintain the below defaults...

14. Most configurations will keep default. However, these settings can be changed later via the Server Management Console (in the Services tab) noted below.

15. Select the network size based on the scaling you used to size your server...

16. Verify your configuration...

17. At this point installation/configuration is complete and you are ready to start the client...

18. Server and Client startup...

19. Following initial login below, you will need to change the Administrator Password from the default. Once you have logged in you can perform this from Server > Users

Launching the Remote Client

To launch a remote client, complete the following steps:1. Open a web browser and enter the IP address of the Management application server in the Address bar. The Management application web start screen displays.

The web server port number default is 80. However, if SSL is enabled, this will be 443. You must enter the web server port number in addition to the IP address (e.g. IP_Address:Port_Number)

2. Click the Management application web start link.

The Log In dialog box displays.

3. Enter your user name and password.

The defaults are Administrator and password, respectively. If you migrated from a previous release, your username and password do not change.

4. Select or clear the Save password check box to choose whether you want the application to remember your password the next time you login.

5. Click Login.6. Click OK on the Login Banner dialog box. The Management application displays.User Account Management

Centralized authentication is IBM best practice in managing user accounts. Regardless of which method of authentication you use (Radius, TACACS+, LDAP, local) you will need to work with your security team to ensure you are meeting the account and IBM requirements.

ITCS104

The ITCS104 Technical Security Standards for SAN Switches may be found here.User Management

IBM Network Advisor provides a thorough role-based access control (RBAC) feature to define detailed roles and privileges for SAN administrators per the below. Provides current authentication and authorization configuration details

Consolidated list of user profiles, roles, and areas of responsibility (AOR)

Provisions to add, modify, duplicate a user profile, role, and AORs

Account State column shows active or lock out reasons

Access restricted to user assigned with User Management privilege with Read-Only/Read-Write permission.

No limit for number users added to Brocade Network Advisor. Number of users is dependent on the data base storage limit.

Local authentication (local password database), Windows domain login, LDAP, RADIUS, and TACACS+ are supported. Automatic failover to a secondary authentication method can be configured, in case a remote primary authentication method becomes unavailable. Privileges: Provide access to the features in Management application. Role: Group of selected privileges. A role can be assigned to one or more Management application users who need access to the same menu options. AOR (Areas of Responsibility): Used to define device access permission to a user. AORs have the ability to group fabrics, hosts, and other products. AORs can be modified, deleted, or duplicated.

Default and User-defined AccountsIn addition to the default accountsroot, factory, admin, and userFabric OS supports up to 252 additional user-defined accounts in each logical switch (domain). These accounts expand your ability to track account access and audit administrative activities. See the Fabric OS Administrators Guide below for in-depth detail on setting up these accounts.

NOTE: The default user accounts (root, factory, admin, and user) need to be properly secured. Change the default passwords for root and factory and keep these separate and secure. The root and factory accounts provide a level of access beyond the admin account.

Work with your security team in securing and managing the Root and Factory accounts

Work with your security team to define non-default Admin and User accounts with the same access for your users

Disable the default Admin and User accounts

AAA (Authentication, Authorization, and Accounting) Settings

The Authentication function enables you to configure an authentication server and establish authentication policies. Authentication is configured to the local database by default. If you configure primary authentication to a Radius server, a TACACS+ server, an LDAP server, or switch authentication, you can also configure secondary authentication to the local server. When you log in to the Management application, if the primary server is unavailable, the Management application attempts with the next configured primary server. If all primary servers are unavailable, the Management application falls back to the secondary authentication. Fall back can occur when the server is unavailable, authentication fails, or the user is not found.

Configuring authentication may be performed through the Network Advisor Server Management Console. See the Server Management Console section of the Network Advisor User Manual for details on setting up Radius, TACACS+, LDAP, etc. authentication methods.Server Management Console

The Server Management Console (Start > Programs > IBM Network Advisor 11.1.x > Server Management Console) may be used to restart services, change port settings, restore data, and upload technical support information. We will go through a few of these in the screenshots that follow...

From the Services tab, you can start, stop, refresh, and restart services on the server.

From the Ports tab, you can change the Management application server or web server port numbers.

From the AAA Settings tab, you can configure different authentication methods (LDAP or RADIUS, etc.), and establish authentication policies.

From the Restore tab, you can restore server application data. Application: Server > Options > Server Backup.

NOTE: The Restore Path is what you set above in the Server Data Backup section (E:\Backup).

From the Technical Support Information tab, you can collect information for technical support.

IBM Network Advisor Configuration Screen

Should you find that you need to change a configuration to one of the settings in the screen below, you may access via: Start > Programs > IBM Network Advisor 11.1.x > IBM Network Advisor Configuration

Backup and Restore Configuration Data

Switch Backup and RestoreSaving switch configurationsSave switch configuration is only supported on Fabric OS switches. To save switch configuration on more than one switch at a time, you must have the Enhanced Group Management license.

Configuration files are uploaded from the selected switches and stored in individual files. Files are named with the convention cfg_fabricName_switchName_domainID.

1. Select Configure > Configuration > Save. The Save Switch Configurations dialog box displays.

2. Select the switches for which you want to save configuration files from Available Switches.

3. Click the right arrow to move the selected switches to Selected Switches.

4. Click OK. Configuration files from the selected switches are saved to the repository.

Restoring a switch configuration for a selected device

The Restore Switch Configuration dialog box enables you to download a previously saved switch configuration to a selected device. Stored configurations are linked to the switch WWN; therefore, if the IP address or switch name is changed and then rediscovered, the Switch Configuration Repository dialog box displays the new switch name and IP address for the old configuration. If you delete a fabric or switch from discovery, the configuration remains in the repository until you delete it manually.

1. Right-click a device in the Product List or the Connectivity Map, and select Configuration >

Configuration Repository. The Switch Configuration Repository dialog box displays.

2. Select the configuration you want to restore, and click Restore.

The configuration is downloaded to the device. If necessary, the restoration process prompts you to disable and reboot the device before the configuration begins. This lets you determine whether the configuration backup should be performed immediately or at a later time. If you confirm the restoration, the entire configuration is restored; you cannot perform selective download for specific configuration sections.

Scheduling Switch BackupsThe Enhanced Group Management (EGM) license must be activated on a switch to perform this procedure and to use the supportSave module.

If a periodic backup is scheduled at the SAN level, that backup will apply to all switches from all discovered fabrics. Any new fabrics being discovered are automatically added to the list of fabrics to be backed up.

If a backup is scheduled for more than one fabric and some of the fabrics contain common members, the backup will include the unique switch configuration values obtained from the fabrics.

You can schedule a backup of one or more switch configurations. The configuration files are stored in the Management application database.

1. Right-click a device in the Product List or the Connectivity Map, and select Configure > Configuration >

Schedule Backup.

The Schedule Backup of Switch Configurations dialog box displays.

2. Click the Enable scheduled backup check box.

3. Set the Schedule parameters:

The desired Frequency for backup operations (select weekly)

Choose a day of the week when utilization is low (e.g. Sunday)

The Time (hour, minute) you want back up to run.

The maximum age allowed before you Purge Backups. The number of purge days should be at least one day more than the selected backup frequency.

The backup purge thread runs every day at 12:30 PM and deletes all back up configurations that exceed the maximum age allowed.

4. Choose one of the following options to determine the scope of the backup.

Select the Backup all fabrics check box, to back up all switch configurations of discovered switches in all fabrics

Clear the Backup all fabrics check box and select the specific fabric check boxes in the

Selected Fabrics table to back up individual fabrics.

If any switches do not have the EGM license, a messages displays. Click OK to enable backup on the switches with the EGM license.5. Click OK.

Server Data Backup and Restore

Network Advisor helps you protect your data by backing it up automatically. The data can then be restored as necessary. What is backed up?If we set our backup for the D:\ drive (or whatever the backup drive is) the following files/data will reside in D:\Backup, as follows:

Backup\databases contains database and log files.

Backup\data contains M-EOS switches Element Manager data files (including Dump files,

Data collection progress files, Director/Switch firmware files FAF files, Switch technical

SupportSave, and Switch backup files) and Fabric OS miscellaneous files.

Backup\conf contains the Management application configuration files.

Backup\cimom contains the SMIA configuration files.Configuring backup to a hard driveNOTE: This requires a hard drive. The drive should not be the same physical drive on which the Operating

System or the Management application is installed.

To configure the backup function to a hard drive, complete the following steps (screenshot below for reference).

1. Select Server > Options. The Options dialog box displays.

2. Select Server Backup in the Category list. The currently defined directory displays in the Backup Output Directory field.

3. Select the Enable Backup check box, if necessary.

4. Choose the following option:

Select the Include FTP Root directory check box.

In selecting the FTP Root directory, the FTP Root sub-directories, Technical Support and Trace Dump, are selected automatically and you cannot clear the sub-directory selections.

5. Enter the time (using a 24-hour clock) you want the backup process to begin in the Next Backup Start Time Hours and Minutes fields.

6. Select an interval from the Backup Interval drop-down list to set how often backup occurs.

7. Browse to the hard drive and directory to which you want to back up your data (this should be a separate physical drive).

8. Click Apply or OK.

The application verifies that the backup device exists and that the server can write to it. If the device does not exist or is not writable, an error message displays that states you have entered an invalid device. Click OK to go back to the Options dialog box and fix the error. Backup occurs, if needed, at the interval you specified.

Enabling backupBackup is enabled by default. However, if it has been disabled, complete the following steps to enable the function.

1. Select Server > Options.

The Options dialog box displays.

2. Select Server Backup in the Category list.

3. Select the Enable Backup check box.

4. Click Apply or OK.

Viewing the backup status

The Management application enables you to view the backup status at a glance by providing a backup status icon on the Status Bar. The following table illustrates and describes the icons that indicate the current status of the backup function.

Server Data Restore

This can be performed via the Restore tab in the Server Management Console section (below).

Event LogsYou can view all events that take place through the Master Log at the bottom of the main window. You can also view a specific log by selecting an option from: Monitor > Logs (submenu). These logs are described in the following list:

Audit Log. Displays all Application Events raised by the application modules and all Audit

Syslog messages from the switches and Brocade HBAs. Product Event Log. Displays all Product Event type events from all discovered switches and

Brocade HBAs. Fabric Log. (SAN only) Displays Product Events, Device Status, and Product Audit type events for all discovered fabrics. FICON Log. Displays all the RLIR and LRIR type events, for example, link incident type events. Product Status Log. (SAN only) Displays events which indicate a change in Switch Status for all discovered switches and Brocade HBAs. Security Log. Displays all security events for the discovered switches. Syslog Log. Displays syslog messages from switches and HBAs.Master LogThe Master Log, which displays in the lower left area of the main window, lists the events and alerts that have occurred on the SAN. If you do not see the Master Log, select View > Show Panels > All Panels or press F5.The following fields and columns are included in the Master Log: Severity. The severity of the event. When the same event (Warning or Error) occurs repeatedly, the Management application automatically eliminates the additional occurrences. Acknowledged. Whether the event is acknowledged or not. Select the check box to acknowledge the event. Source Name. The product on which the event occurred. Source Address. The IP address (IPv4 or IPv6 format) of the product on which the event occurred. Origin. The event source type (for example trap, pseudo-event, application, or syslog). Category. The type of event that occurred (for example, client/server communication events). Description. A description of the event. Last Event Server Time. The time and date the event last occurred on the server. Count. The number of times the event occurred. Module Name. The name of the module on which the event occurred. Message ID. The message ID of the event. Product Address. The IP address of the product on which the event originated. Contributor. The name of the contributor on which the event occurred. Node WWN. The world-wide name of the node on which the event occurred. Fabric Name. The name of the fabric on which the event occurred. Operational Status. The operational status (such as, unknown, healthy, marginal, or down) of the product on which the event occurred. First Event Product Time. The time and date the event first occurred on the product. Last Event Product Time. The time and date the event last occurred on the product. First Event Server Time. The time and date the event first occurred on the server. Audit. The audit of the event. Virtual Fabric ID. The VFID of the product on which the event occurred. Zone Alias. Displays the zone alias of the product or port.Collect SupportSave

To collect switch and Network Advisor supportsaves select the Monitor -> Technical Support

Network Advisor Supportsave

To collect a Network Advisor supportsave select Monitor->Technical Support->Supportsave

Supportsave Manual CollectionTo collect a switch supportsave select Monitor->Technical Support->Product / Host Supportsave, and select the Generate Now tab. From the panel on the left select the switches, or fabric which you want to collect supportsave files from and press the right arrow which will display the selected machine in the left hand panel. Once all the required switches are listed in the left panel press the OK push button to start the supportsave collection process.

A dialog box indicating the supportsave has started will be displayed.

Messages in the Master log will also indicate the start and completion of the support save.

Supportsave Scheduled CollectionTo collect a switch supportsave select Monitor->Technical Support->Product / Host Supportsave, and select the Generate Now tab. . From the panel on the left set the frequency to collect the supportsave files (weekly recommended) and the day of the week and time to collect the files (Sun evening is recommended). Select the switches, or fabric which you want to collect supportsave files from and press the right arrow which will display the selected machine in the left hand panel. Once all the required switches are listed in the left panel press the OK push button to start the supportsave collection process. Event notificationCall HomeNetwork Advisor supports call home to IBM Support. This will allow automatic creation of a problem record with IBM in response to significant error events on devices you are managing in your SAN. Additional information can be found at the following links:

Brocade Network Advisor User Manual This is a direct link to the Brocade User Manual Call Home section and provides in-depth instruction on how to configure

IBM Network Advisor Call Home Setup This link provides IBM-specific email addresses and phone numbers to use when configuring Call Home. You may need to consult with your security team to ensure your security model allows call home via email and/or phoneSNMP

As accounts may not have identical infrastructures, SNMP traps should be configured to be sent to the event capture and reporting tool deployed for each account. You will need to work with your SNMP Trap Collector (i.e. Netview, NetCool, etc.) administrator to ensure all alerts noted in the below sections are defined properly and are being received.

NOTE: Recommendation is to configure SNMP v3. If your capture tool does not support this, use SNMP v1 (If you need to use SNMP v1, do not use the defaultTrap enablement tasksConfiguring individual SNMP traps this must be done on a per switch basis within the Web Tools interface. Enable SNMP per the following on each of your Brocade products (switches, directors, etc.).

1. From Web Tools, click on Switch Admin > Show Advanced Mode

2. This will bring you to the following screen, select SNMP here

3. At the SNMPv3 Inform/Trap Recipient:

Select a User Name Provide an IP address for the Recipient IP Set Trap Level to 3-Warning level

Fabric Watch

Fabric Watch tracks a variety of SAN fabric elements and events. Monitoring fabric-wide events, ports, and, environmental parameters to enable early fault detection via SNMP.

Reasons to Implement Fabric Watch

IBM in general has not been manually monitoring for error conditions within our SAN environments to date and this has led to multiple customer impacts that could have been easily avoided.

Fabric Watch can be enabled and thresholds set to alert on these events for code level 6.3 and above.

Fabric Watch specific alerts to be enabled are documented below.

Fabric Watch should have been purchased with the switch (it is a FOS feature, and is included automatically with all Brocade SAN switches purchased from IBM).

When configuring Fabric Watch, the Fabric/Port Class and Alert Type/Threshold settings below should be followed:ClassAreaAlert TypeHigh BoundaryTimeAlert

SFPSTSFP State Change0Minutesraslog

FabricEDE_Ports Down0Minutesraslog,snmp

FCFabric Reconfigure0Minutesraslog,snmp

DCDomain ID Changes0Minutesraslog

SCSegmentation0Minutesraslog,snmp

ZCZone Changes10Minutesraslog

FLFabric Logins10Minutesraslog

E_PortSTState Change10Minutesraslog,snmp

PEProtocol Error5Minutesraslog

LRLink Reset2Minutesraslog,snmp

ITWInvalid Tx Words (enc_out)25Minutesraslog

CRCInvalid CRCs5Minutesraslog,snmp

C3TX_TOC3 Discards5Minutesraslog,snmp

RXRx Performance75%Minutesraslog

TXTx Performance75%Minutesraslog

FOP_Port (Fibre Optical Port)STState Change25Minutesraslog

PEProtocol Error5Minutesraslog

LRLink Reset25Minutesraslog,snmp

ITWInvalid Tx Words (enc_out)25Minutesraslog

CRCInvalid CRCs5Minutesraslog,snmp

C3TX_TOC3 Discards5Minutesraslog,snmp

RXRx Performance90%Minutesraslog

TXTx Performance90%Minutesraslog

Configuring Fabric Watch1. Login to Web Tools and open the Fabric Watch GUI:

2. Select the appropriate Class (F/FL Optical Port, E-Port, or Fabric) from the left screen pane:

3. From the Threshold Configuration tab at top, select Trait Configuration4. Enter Time Base and High Boundary (from the settings noted above in this document)

5. Select Custom Defined and ApplyThe example below will configure E_Ports to alert on CRC Errors which exceed 5 within 1 minute:

6. Select the Alarm Configuration tab

7. Select Above for ERROR_LOG, SNMP_TRAP (and EMAIL_ALERT if applicable). If email alerting is used you will need to provide an address via the Email Configuration tab (top right of screen in above example).

8. Select Custom Defined and Apply (this needs to be done for each alert)9. Once parameters for all alerts have been set, the same configuration may be replicated to other switches

From the interface: Configure > Configuration -> Replicate -> Configuration

Configuration Type > Partial FC > Fabric Watch:

Select Configuration from the Switch:

Select the switch for which you just configured all Fabric Class, E_Port, and F_Port Class settings:

Select the other switches in your fabric for which you want to enable Fabric Watch (using same settings):

Following the above screen you will be presented with Validation and Summary screens to complete the distribution of Fabric Watch settings.

Bottleneck Credit Tools

The bottleneck credit tool is used to automatically reset back end ports when loss of credits is

detected on the back end ports. This function was introduced in Brocade FOS v7.0.0 and

v6.4.2 and was further enhanced with improved credit loss detection in FOS v7.0.1b and

v6.4.3

Enabling bottleneck credit tools

Use the --cfgcredittools commands to enable or disable credit recovery of back-end ports, and use the --showcredittools parameter to display the configuration. When this feature is enabled, credit is recovered on back-end ports (ports connected to the core blade or core blade back-end ports) when credit loss is detected on these ports. If complete loss of credit on a Condor 2 back-end port causes frame timeouts, an LR is performed on that port regardless of the configured setting, even if that setting is -recover off.

When used with the -recover onLrOnly option, the recovery mechanism takes the following

escalating actions:

When the mechanism detects credit loss, it performs an LR and logs a RASlog message (CX-1014).

If the LR fails to recover the port, the port reinitializes. A RASlog message is generated (CX-1015). Note that the port reinitialization does not fault the blade.

If the port fails to reinitialize, the port is faulted. A RASlog message (RAS CX-1016) is generated.

If a port is faulted, and there are no more online back-end ports in the trunk, the port blade is faulted. A RASlog message (RAS CX-1017) is generated.Enable credit recovery tool with the LROnly option.

bottleneckmon --cfgcredittools -intport recover onLrOnlyBottleneck Detection

As transmission speeds within SAN fabrics continue to increase devices causing latency within the fabric have a larger impact on the overall health of the fabric. Devices causing latency have caused multiple customer impacts within IBM. Bottleneck Detection now provides a way to automatically watch for and alert upon high latency devices. This ability has already proven to shorten environment impact times within IBM operated environments from days to hours.Recommendations

Field experience shows that the original strategy of enabling Bottleneck Detection with conservative values for latency thresholds almost always yields no results. There was a concern that aggressive values would result in Bottleneck Detection alert storms, but this has not been the case. Even the most aggressive values result in relatively few alerts being generated. As a result, it is now recommended that the most aggressive settings are tried first and then backed off gradually if too many alerts are seen. Brocade 48000 should have no more than 100 ports monitored due to memory constraints

Congestion Threshold (-cthresh): Is new starting with code level 6.4. This monitors bandwidth utilization, the percentage of time that a link exceeds 95% utilization. The recommendation is to stay with the Brocade default value for this setting (80%). This means that if an individual link exceeds 95% utilization for 80+% of the measurement interval (the time specification= 30 seconds) an alert will be sent.

Latency Threshold (-lthresh): This is the minimum percent of time when a latency is detected (default is 20% or .2) This is the parameter we will adjust as we fine-tune BD Window: Specifies the measurement interval for measuring latency

Quiet Time: Specifies how often to send any tripped alertsSuggested Bottleneck Settings

FOS 6.3ParameterConservative SettingsNormal SettingsAggressive Settings

-time300605

-qtime300601

-thresh0.30.20.1

FOS 6.4ParameterConservative SettingNormal SettingsAggressive Settings

-time300605

-qtime300601

-lthresh0.30.20.1

-cthresh0.80.50.1

FOS 7.0

ParameterConservative SettingNormal SettingAggressive Setting

-time300605

-qtime300601

-lthresh0.30.20.1

-cthresh0.80.50.1

-lsubsectimethresh0.80.50.5 (no less)

-lsubsecsevthresh75501

ImplementationNOTE: The bottleneck detection feature detects latency bottlenecks only at the point of egress, not ingress

Enable Bottleneckmon via GUI1. Select Monitor > Performance > Bottlenecks.

The Bottlenecks dialog box displays.

2. Select Enable if it is not already selected.

3. Select the Alerts check box to enable alerts.

4. Use the below for your initial settings (see section below for additional tuning settings):

Congestion 50%

Latency 20%

Window 60 seconds

Quiet Time 60 seconds

5. Select Ports from the Products/Ports list. Select only F_ports.

6. Click the right arrow to apply the settings in the Bottleneck Detection pane to the selected

elements in the Products/Ports list.

7. Click OK or Apply to save your changes

8. See next section for tuning your initial settingsEnable Bottleneckmon via CLI

FOS 6.4 bottleneckmon --enable -lthresh 0.2 -cthresh 0.5 -time 60 -qtime 60 alertFOS 7.0

bottleneckmon --enable -lthresh 0.2 -cthresh 0.5 -time 60 -qtime 60 -lsubsectimethresh 0.5 -lsubsecsevthresh 50 -alertHow Bottlenecks are reported in Network Advisor

Bottlenecks are reported through alerts in the Master Log. A bottleneck cleared alert is sent when

the bottleneck is cleared.

NOTE: A bottleneck cleared alert is sent if you disable bottleneck detection on a bottlenecked port, even though the port is still bottlenecked.

Bottlenecks can be highlighted in the Connectivity Map and Product List. Select Monitor > Performance > View Bottlenecks. If a port is experiencing a bottleneck, a bottleneck icon is displayed in the Connectivity Map for the switch and fabric, and in the Product List for the port, switch, and fabrc. In the figure below, port15 and port22 are bottlenecked.

Port FencingReasons to Implement Port FencingAs transmission speeds within SAN fabrics continue to increase, devices causing latency within the fabric have a larger impact on the overall health of the fabric. The health of the fabric may degrade faster than an alert can be sent, received by the monitoring team, support tickets opened, and the required manual action to protect the fabric be taken.

Port Fencing provides a way to have the fabric respond to error-level thresholds by disabling port with high error rates. It sends an alert that this action has been taken so the steady state team can repair the situation and then bring the port back online.ImplementationNOTE: Port Fencing should only be done after the environment has successfully implemented Fabric Watch using the settings recommended in this guide. Healthy SAN fabrics are a prerequisite to implementation of Port Fencing. DO NOT implement Port Fencing unless the following criteria are met: The environment is running code level 7.0.2c or newer. . In code levels prior to 7.0.2c, the FW-1510 alert sent by the switch to inform administrators that Port Fencing has disabled ports is at an Informational severity level. This alert severity has been raised to Error in the 7.0.2c release.

The monitoring or steady state team has the cycles to monitor Informational SNMP alerts from the SAN switches.

A mature SNMP monitoring and response process must be in place prior to implementation of Port Fencing. Port Fencing is going to disable ports, a steady state team must receive these alerts and take action to fix the port and bring it back online. Failure to ever take action will result in future Client Impacting Events.Example: 1 of 2 SAN ports for a server exceeds the Port Fencing threshold and the port is automatically disabled by the SAN switch. The steady state team does not repair the port and bring it back online. A month later the remaining HBA in the server fails, now the server has no connectivity to back-end SAN storage devices.

When configuring Port Fencing within FOS v6.4.2a, the Violation Type and Threshold settings below should be followed:

E Port Class Area (note: the Time Base for all Alerts = 1 minute)

Violation Type ThresholdProtocol Error 10

Link Reset 10

Invalid Words (enc out) 60

Invalid CRCs 30

F Port Class Area (note: the Time Base for all Alerts = 1 minute)

Violation Type ThresholdProtocol Error 5

Link Reset 200

Invalid Words (enc out) 40

Invalid CRCs 20

C3 Discards (C3TX_TO) 40

Adding thresholds (Violation types):

1. To access Port Fencing select: Monitor > Fabric Watch > Port Fencing The Port Fencing dialog box displays:

2. Select C3 Discard Frames from Violation Type and click Add3. In the pop-up window, enter a Name, select Custom, enter Threshold, and Time (per parameters noted above)

Assigning thresholds to ports:

To assign an existing threshold to a port type, complete the following steps.

1. Select Monitor > Fabric Watch > Port FencingThe Port Fencing dialog box displays

2. Select a threshold type from the Violation Type list

3. Select the threshold you want to assign from the Thresholds table

4. Select the Port Type (E Port Class or F Port Class noted above), to which you want to assign the threshold from the Ports table. Do NOT assign a Port Type/Class to an incorrect Violation Type.

5. Click the right arrow

A directly assigned icon displays next to the objects you selected in the Ports table to show that the threshold was applied at this level.

An added icon appears next to every object in the tree to which the new threshold is applied.

6. Click OK on the Port Fencing dialog box.Unblocking a Port

Network Advisor allows you to unblock a port (only if it was blocked by Port Fencing) once the problem that triggered the threshold is fixed.When a port is blocked, and Attention icon displays next to the port node.

To unblock a port, complete the following steps.

1. Select Monitor > Fabric Watch > Port Fencing.

The Port Fencing dialog box displays.

2. Right-click anywhere in the Ports table and select Expand.

3. Select a blocked port from the Ports table.

4. Click Unblock.

5. Click OK on the message.

If you did not solve the root problem, the threshold will trigger again.

6. Click OK on the Port Fencing dialog box.Removing ThresholdsTo remove thresholds from the All Fabrics object, an individual Fabric, Chassis group, Switch, or

Switch Port, complete the following steps.

1. Select Monitor > Fabric Watch > Port Fencing.

The Port Fencing dialog box displays.

2. Select a threshold type from the Violation Type list.

3. Select the object with the threshold you want to remove in the Ports table.

4. Click the left arrow.Brocade Fabric VisionBrocade Fabric Vision is a collection of hardware and software functions in FOS 7.2 and Gen 5 Fiber Channel Switches. Fabric Vision consists of the following elements

MAPS Monitoring and Alerting Policy Suite

recommended see Monitoring and Alerting Policy Suite (MAPS) Bottleneck Detection

recommended see Bottleneck Detection Credit Loss Detection

recommended see Bottleneck Credit Tools Forward Error Correction

enabled on Gen 5 hardware switches Brocade ClearLink Diagnostics

for installation and diagnostic use Network Advisor Dashboards

recommended see Network Advisor Dashboards Flow Vision (includes Flow Monitoring, Flow Mirroring and Flow Generation)

for advanced PD onlySome Fabric Vision technology features are supported on Gen 4 b-type platforms; others are available only on Gen 5 Fibre Channel platforms with 16 Gbps performance capability. The chart below shows the various Fabric Vision technology features supported on each generation of products:

FeatureGen 4 PlatformsGen 5 Platforms

8 Gbps FC and associated capabilities16 Gbps FC and associated capabilities

Latency Bottleneck DetectionYesYes

Forward Error CorrectionNoYes

VC-level BB_Credit RecoveryNoYes

ClearLink Diagnostics (D_Port)NoYes

MAPSYesYes

Flow MonitoringYes, with some limitationsYes

Flow MirroringNoYes

Flow GeneratorNoYes

Monitoring and Alerting Policy Suite (MAPS)The Monitoring and Alerting Policy Suite (MAPS) is a storage area network (SAN) health monitor supported on all switches running Fabric OS 7.2.0 or later. This will replace Fabric Watch as the default health monitor once the FOS is at v7.2.0 or later. MAPS allows you to enable each switch to constantly monitor itself for potential faults and automatically alerts you to problems before they become failures.It is recommend setting up MAPs and not migrating the Fabric Watch settings, unless Fabric Watch was setup for a specific reason. See Initial MAPS setupMAPS Licensing Requirements and Software PrerequisitesSwitches with Fabric Watch and Advanced Performance Monitor licenses automatically get the Fabric Vision license features by upgrading to FOS v7.2Switches with only Fabric Watch or Advanced Performance Monitor can upgrade to Fabric Vision by purchasing other license (either Fabric Watch or Advanced Performance Monitor license).

MAPS Software Prerequisites:

FOS Version: v7.2.0d

IBM Network Advisor: 12.13 or higher.

NOTE: MAPS is the follow-on product to Fabric Watch, and while both require a license Fabric Watch customers can upgrade to MAPS without additional cost.If the switch currently has Fabric Watch setup and properly monitoring the fabric those Fabric Watch settings can be migrated to MAPS rules.

Differences between Fabric Watch and MAPS configurationsConfigurationFabric Watch behaviorMAPS behavior

End-to-End monitoring

(Performance Monitor class)SupportedSupported through flows.

Frame monitoring

(Performance Monitor class)SupportedSupported through flows.

RX, TX monitoringOccurs at the individual physical port

level.Occurs at the trunk or port level as applicable.

Pause/Continue behaviorOccurs at the element or counter level. For example, monitoring can be paused for CRC on one port and for ITW on another port.Occurs at the element level. Monitoring can be paused on a specific port, but not for a specific counter on that port.

CPU/Memory polling intervalCan configure the polling interval as well

as the repeat count.This configuration can be migrated from Fabric Watch, but cannot be changed.

E-mail notification

ConfigurationDifferent e-mail addresses can be configured for different classes.E-mail configuration supported globally.

Temperature sensor

MonitoringCan monitor temperature values.Can monitor only the states of the sensors (In_Range or Out_of_range).

Converting from Fabric Watch to MAPS

4. Backup the switch configuration using configupload5. Use the maspconfig fwconvert to convert Fabric Watch rules to MAPS.If Fabric Watch is currently in use this needs to be done before enabling MAPS, to preserve the Fabric Watch thresholds.Three new maps policies are create fw_active_policy based on the Fabric Watch settings currently active, fw_defaut_policy based on the Fabric Watch default settings and fw_custom_policy based on any Fabric Watch custom policies that were created.6. The conversion is one way, you cannot convert MAPS rules back to Fabric Watch

7. The first time you enable MAPS, using the command mapsconfig --enablemaps -policy fw_active_policy you will receive a warning (screenshot of the same given below).

8. Set allowable actions for rules using mapsconfig --actions raslog, snmp, email, sw_critical, sw_marginal, sfp_marginal

Make sure port fencing is not enabled / included in the mapsconfig command.Initial MAPS setupFor switches running FOS 7.2 or higher and that do not have Fabric Watch currently configured to monitor and alert for fabric events or if a clean MAPS setup is required use the following procedure

The recommended port monitoring strategy is to log marginal port events to the RAS log which should be reviewed on a regular bases, and to generate SNMP or email alerts for serious port events that need immediate attention.Note: To implement this policy you can simply import the IBM_SO policy see Importing MAPS configurationSince the MAPS default policies generate SNMP / email alerts for all of their port events the strategy is to copy the default policy as a base, but to replace the port rules with rules that implement the above strategy based on the settings defined for Fabric Watch.9. Create a copy of the MAPS default moderate policy as a base mapspolicy --clone dflt_moderate_policy -name IBM_SO10. Remove the port rules from the policy using the following commands.Note: Must be run from root for i in $(mapspolicy --show IBM_SO | grep defNON | awk '{print $1}'); do mapspolicy --delrule IBM_RTS -rulename $i; done for i in $(mapspolicy --show IBM_SO | grep E_PORTS | awk '{print $1}'); do mapspolicy --delrule IBM_RTS -rulename $i; done for i in $(mapspolicy --show IBM_SO | grep F_PORTS | awk '{print $1}'); do mapspolicy --delrule IBM_RTS -rulename $i; done for i in $(mapspolicy --show IBM_SO | grep T_PORTS | awk '{print $1}'); do mapspolicy --delrule IBM_RTS -rulename $i; done mapspolicy --delrule IBM_SO -rulename defSWITCHSEC_TS_D1011. Create new F-Port rules for the new IBM_SO policy mapsRule --create F_PORTS_PE_5 -group ALL_F_PORTS -timebase min -op g -policy IBM_SO -monitor PE -value 5 -action RASLOG mapsRule --create F_PORTS_ITW_25 -group ALL_F_PORTS -timebase min -op g -policy IBM_SO -monitor ITW -value 25 -action RASLOG mapsRule --create F_PORTS_CRC_5 -group ALL_F_PORTS -timebase min -op g -policy IBM_SO -monitor CRC -value 5 -action RASLOG,SNMP,EMAIL mapsRule --create F_PORTS_CRC_H25 -group ALL_F_PORTS -timebase hour -op g -policy IBM_SO -monitor CRC -value 25 -action RASLOG,SNMP,EMAIL mapsRule --create F_PORTS_LR_3 -group ALL_F_PORTS -timebase min -op g -policy IBM_SO -monitor LR -value 3 -action RASLOG,SNMP,EMAIL mapsRule --create F_PORTS_LR_H10 -group ALL_F_PORTS -timebase hour -op g -policy IBM_SO -monitor LR -value 10 -action RASLOG,SNMP,EMAIL mapsRule --create F_PORTS_C3TXTO_5 -group ALL_F_PORTS -timebase min -op g -policy IBM_SO -monitor C3TXTO -value 5 -action RASLOG,SNMP,EMAIL mapsRule --create F_PORTS_TX_90 -group ALL_F_PORTS -timebase min -op g -policy IBM_SO -monitor TX -value 90 -action RASLOG mapsRule --create F_PORTS_RX_90 -group ALL_F_PORTS -timebase min -op g -policy IBM_SO -monitor RX -value 90 -action RASLOG12. Create new E-Port rules for the new IBM_SO policy

mapsRule --create E_PORTS_PE_5 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor PE -value 5 -action RASLOG mapsRule --create E_PORTS_ITW_25 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor ITW -value 25 -action RASLOG mapsRule --create E_PORTS_CRC_5 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor CRC -value 5 -action RASLOG,SNMP,EMAIL mapsRule --create E_PORTS_CRC_H25 -group ALL_F_PORTS -timebase hour -op g -policy IBM_SO -monitor CRC -value 25 -action RASLOG,SNMP,EMAIL mapsRule --create E_PORTS_LR_3 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor LR -value 3 -action RASLOG,SNMP,EMAIL mapsRule --create E_PORTS_LR_H10 -group ALL_F_PORTS -timebase hour -op g -policy IBM_SO -monitor LR -value 10 -action RASLOG,SNMP,EMAIL mapsRule --create E_PORTS_ST_1 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor STATE_CHG -value 1 -action RASLOG,SNMP,EMAIL mapsRule --create E_PORTS_C3TXTO_5 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor C3TXTO -value 5 -action RASLOG,SNMP,EMAIL mapsRule --create E_PORTS_TX_75 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor TX -value 75 -action RASLOG mapsRule --create E_PORTS_RX_75 -group ALL_E_PORTS -timebase min -op g -policy IBM_SO -monitor RX -value 75 -action RASLOG13. Enable the IBM_SO policy

mapsConfig --enablemaps -policy IBM_SO14. Set allowable actions

mapsconfig --action RASLOG,SW_CRITICAL,SW_MARGINAL,SW_HEALTHY,SFP_MARGINALImporting MAPS configuration

It is possible to import a MAPS policy and its rules instead of manually setting up MAPS as per the above section. There is an IBM_SO MAPS policy which is available and can be imported to enable setting up MAPS quickly.

Select the MAPS configure dialog by selecting Monitor->Fabric Vision->MAPS->Configure

Select the switch you want to import the MAPS policy into and select the IMPORT button.

Select the IBM_SO xml file

A progress message is displayed during the import.

A final status messages is displayed when the import is completed.

To activate the policy expand the list of policies for the switch, select the IBM_SO policy and press the Activate push button.

To enable the apropriate actions for the switch select the switch and press the Actions push button.

Typically all actions except Fence are enabled.

Replicating a policy to other devices

You can replicate a non-default policy on a device to all MAPS-capable devices in a Fabric or SAN.

NOTE: Copying a policy from one device to another overwrites any policy with a matching name on the target devices

Right-click a device in the Product List or Connectivity Map and select Fabric Vision > MAPS > Configure

The MAPS Configuration dialog box displays.

Select a non-default policy on a device (source) you want to replicate in the list and click Distribute.

The Distribution Options dialog box displays.

Set the destination by choosing one of the following options:

All fabric distribution Select to replicate the policy on all MAPS-capable devices in the SAN.

Within fabric distribution Select to replicate the policy on all MAPS-capable devices in the selected Fabric.

Set the activation parameters by choosing one of the following options:

Activate policy on each switch Select to immediately activate the policy on the target devices after distribution. If the selected policy is not an active policy, Activate after distribution activates the policy on the source device as well as the target devices.

Do not activate policy on each switch Select to not activate the policy on the target devices after distribution.

Click OK on the Distribution Options dialog box.

The selected policy is replicated on all MAPS-capable devices in the selected Fabric or SAN.

If you chose to activate the policy after distribution, the selected policy is activated the target devices and the source device, if necessary.

Click Close on the MAPS Configuration dialog box.

MAPS and Bottleneck Monitor

1 The MAPS dashboard mapsdb --show simplifies bottleneck event integration in FOX v7.2. Bottleneck events are reported in the summary section of the report output.

2 The MAPS dashboard is used only for logging bottleneck latency events. Congestion bottleneck events are not logged on the MAPS dashboard.

3 The MAPS dashboard will continue to log events whether RASLogs are set to on or off in the bottleneck configuration.

4 The MAPS dashboard history section updates its display of CRED_ZERO (measured in millions) and BN_SECS values at one minute interval.

4.1 BN_SECS indicates the total seconds that were marked as being affected by bottlenecks since the previous midnight.

Enable MAPS in Network Advisor1. Log In to NA. From the Monitor menu choose the Fabric Vision sub menu, select MAPS and Enable

Activate MAPS Policy from Network Advisor

Log In to NA. From the Monitor menu choose the Fabric Vision sub menu, select MAPS and Configure

Highlight the switch to be configured, select the dflt_moderate_policy or IBM_SO policy and click the Activate button.

Confirm dflt_moderate_policy or IBM_SO is now the active policy.

View the Parameters in a Policy

Log In to NA. From the Monitor menu choose the Fabric Vision sub menu, select MAPS and Configure

Highlight the switch with the policy to be viewed, select the policy and click the View button.

Choose the tab related to the parameter to be viewed (Port, Switch Status, Fabric, FRU, Security, Resource, FCIP, Traffic/Flows)

Network Advisor DashboardsThe below IBM Network Advisor Dashboard Widgets, Event Logs, and SAN Health are great tools for doing everything from quick assessments to in-depth investigation of the overall health of your SAN. Dashboard Tab- at a glanceThe Dashboard tab provides a high-level overview of the network and the current states of managed devices. This allows you to easily check the status of the devices on the network. The dashboard also provides several features to help you quickly access reports, device configuration, and system logs. The dashboard updates every 5 seconds regardless of the currently selected tab (SAN or Dashboard) or the SAN size. However, data may become momentarily out of sync between the dashboard and other areas of the application. For example, if you remove a product from the network while another user navigates from the dashboard to a more detailed view of the product, the product may not appear in the detailed view.

The Dashboard includes the following widgets:

1. SAN Operational Status. Displays the device status as a pie chart. Displays the device status as a percentage of the total number of devices. Displays the percentage in various colors on each slice. Displays the color legend below the pie chart. Displays tooltips on mouse-over to show the number of devices in that state. When there is one status category with less than one percent of the total number of devices, the status widget displays the number of devices in each category on each slice.2. SAN Inventory. Displays the SAN products inventory as stacked bar graphs. Displays each group as a separate bar on the graph. Displays the current state of all products discovered for a group in various colors on each bar. Displays the color legend below the y-axis. Displays tooltips on mouse-over to show the number of devices in that state.3. Events. Displays the number of events by severity level for a specified time range as a stacked bar graph. You can customize this widget to display a specific time range. Options include: This Hour, Last Hour, Last 24 Hours, Last 7 days, or Last 30 DaysBrocade SAN Health Report

SAN Health is a powerful (and free) utility from Brocade for surveying your SAN. SAN Health should be run at least on a monthly basis, doing so will help you recognize trends in your environment, as well as unknown current or potential issues. Performing and maintaining regular sets of SAN Health reports can also aid in troubleshooting, as they provide you with a detailed history of events taking place in your SAN.

You can download this utility and instructions for using it from Brocade at:

www.brocade.com/sanhealthBrocade SAN Health reports contain information such as the following: Fabric level information total port count, performance, oversubscription ratios, port utilization, and number of attached devices followed by specific information on each fabric, such as the connected switches, zoning configuration, and a port map. Switch level information such as licenses, port level configurations and ISL usage. Port level information such as bandwidth utilization, CRC counts and port status provides a snapshot on overall port health. Visio diagram shows the logical connection of the switches in the fabrics as well as the connected devices. ISLs, trunks and devices are shown exactly how they are connected to the switch ports. From this diagram, the fabric topology and other information can be viewed quickly and easily. Customized views of devices allow for online device identification, snapshot of performance stats and switch attachment details. Other items in this report include historical performance graphs plus guidelines and recommendations.NOTE: Past reports should be saved for trend and troubleshooting and planning purposes. These reports can be very helpful when trying to identify the source of an issue and should be readily available for Crit-Sit and Sev-1 types of situations. Instructions For Usage

1. Identify the name of the customer in the SAN Health .BSH upload file name

Good Example: James_Smith_120203_1201_ACMEcompany_LexingtonKY.BSH

Bad Example: James_Smith_130610_1454_LEX_FAB.BSH (we do not know what acct in LEX)

2. After uploading .BSH file to [email protected] send a follow-up email to Brocade alias [email protected] letting the Brocade Team know SH is coming. This will avoid duplicate efforts and allow faster response then sending it to individual members.

a. Include full file name(s) that were uploaded e.g. James_Smith_120203_1201_ACMEcompany_LexingtonKY.BSH

When sending any eMails to Brocade please ensure to include

Your name

Your eMail and phone number

Customer name

The geography the device/s will be (are) installed

Device Type / Model, and quantity

Account Focal e.g. SAN Architect, DPE, & etc. - name and contact information

Description on problems or why the request for SAN Health review,

b. any open PMR/SRs list the numbers

3. When configuring SH client http://www.brocade.com/services-support/drivers-downloads/san-health-diagnostics/download_san_health.page be sure to Include [email protected] see screen shot below:

4. Select option to create a separate Visio for each fabric: 5. Clear the stats on all Switches by doing a slotstatsclear and statsclear at least 24 hours prior to running SH report.

6. Set performance to capture minimum of two hours for graphs 7. Make sure you are using the latest client v3.2.6c download from http://www.brocade.com/services-support/drivers-downloads/san-health-diagnostics/download_san_health.page8. Follow-up SAN Health review request are to include status on all actions called out in previous review Brocade Recommendation Summary. Zoning

All zoning tasks must be performed from the Zoning dialog box in the Network Advisor application. You can access the Zoning dialog box from the main screen of the Management application using any of the following methods:

Select Configure > Zoning > Fabric.

Click the Zoning icon on the toolbar.

Right-click a port, a switch, a switch group, or fabric in the device list and select Zoning.

Right-click a port, a switch, a switch group, or fabric in the Connectivity Map and select Zoning.

NOTE: The following points need to be observed when performing zoning operations

Zoning via the CLI or Web Tools interface should never be performed due to the increased potential for catastrophic customer-impacting mistakes associated with these methods.

Single-Initiator Zoning should be used for all zoning. A single-initiator zone contains one HBA in a zone with target device/s.

Your default zoning mode should be set for No Access. This means unzoned devices cannot see each other and therefore requires a zone be established before they can communicateThe following is a procedure for zoning in a Brocade Fabric using IBM Network Advisor and will assure the following:

The current zone configuration in the fabric will be saved to the Network Advisor offline repository and can be restored to the fabric if necessary.

Multiple copies of the fabric zoning configuration will be stored in the offline repository. The number of copies will be dependent on your policy for cleaning out old zone DB copies in the offline repository.

The offline repository will be backed up as part of the scheduled Network Advisor backup when that backup occurs. There will be exposure to lost updates to the zoning DBs should the Network Advisor server become unavailable and have to be restored. The updates from the time of the last backup until the time the server is lost would be unrecoverable.

The current active Fabric Zone DB will always be the zoning DB used for updating when zoning changes are necessary in the fabric. The offline repository zone DBs will only be used for recovery if necessary.

The following will demonstrate the steps necessary to make changes to the current zone configuration and assure a copy of the current zone DB is stored to the offline repository as a fallback if necessary.

The current Fabric Zone DB consists of only 1 zone configuration.

A request has come in to add an additional zone to the fabric, we will add this zone as zone4. Updates to fabric zoning will always be made to the current active zone configuration in the Fabric Zone DB.

To assure that the Network Advisor zoning configuration window is current and assure you are viewing what is currently active in the fabric, perform a Zone DB Operation to refresh the DB. Verify the Zone DB listed is the Fabric Zone DB and perform a refresh.

Zone DB Operation Refresh (See below) You will receive a message indicating you are overwriting the selected zone DB with the one in the fabric, see below. Respond yes, this will guarantee your current view of the Fabric Zone DB is what exists in the fabric.

You will now want to save a copy of the current Fabric Zone DB to the offline repository so that you have a copy to fall back to if necessary. Zone DB Operation Save As (See below)

You will receive a window and need to input a Zone DB Name that will be used to identify the copy of the active Fabric Zone DB you are saving to the offline repository. You should establish a standard naming convention to be used and assure it is enforced. In this example we are using the initials of the person making the change followed by the date the change is being made followed by the name of the active zone configuration.

Once you respond OK to save you will be presented with the following screen. VERY important at this point to notice that the Zone DB listed below is the Zone DB you just saved to the offline repository: RJP_120610_SANWEST_X_CURRENT.

Now that you have saved your changes you can make your updates to the active Fabric Zone DB. VERY important to now go back into the Fabric Zone DB. The Fabric Zone DB is the Zone DB you always want to make your changes to. Zone DB Select Fabric Zone DB from the list (See below)

You will now see that the Fabric Zone DB is listed in the top middle of the screen. See below.

Now that you have saved a copy of the current active Fabric Zone DB to the offline repository and have assured you are again editing the active Fabric Zone DB you are ready to implement your change. For this example you will create a new zone, Zone 4, add it to the current active zone configuration and activate the zone configuration so it gets activated in the fabric. Create the new zone and name it Zone 4.New Zone Type Zone 4 as the name (See below)

Proceed to add the new members to the zone, see below.

Add the newly created Zone4 to the active configuration, see below.

Activate the zone configuration so that your changes are pushed to the fabric. You will be presented with a window that will display the changes you are getting ready to activate in the fabric. You need to VERIFY that these changes are correct and respond OK once you have completed the verification. Highlight current zone configuration Activate Respond OK once you have verified the intended changes are accurate (See below)

You will now see that Zone4 is active in your fabric zone configuration, see below.

Should you realize a mistake was made, you can fall back to the zone configuration that you saved to the offline repository.Zone DB Select the Zone DB you wish to activate from the list, in this example it is RJP_120610_SANWEST_X_CURRENT (See below)

You will now see the Zone DB that you want to fall back to listed. Review the Zone Configuration to assure it is the version you wish to fall back to. Notice the yellow triangle in the Active Zone Configuration tab below. This is a warning to tell you that there is a difference between what is currently active in the fabric and the Zone DB that you are editing in Network Advisor.

Once you have verified that the fall back Zone Configuration is correct then proceed to activate. Highlight the Zone Configuration you wish to activate Click Activate (See below)

You will see a new window displaying the changes to the fabric that will be implemented. After you verify this is accurate, click OK and the changes will be activated in the fabric. You will need to reply YES to a verification window that comes up in order to activate the new configuration.

You will now see the active configuration no longer displays Zone 4. This is the state you were in prior to making changes to add Zone4 to the configuration. The current Zone DB listed is the copy you saved in the offline repository.

You will want to refresh this screen by selecting the Fabric Zone DB to show what is currently active in the fabric. Zone DB Select the Fabric Zone DB (See below)

You will now see the Fabric Zone DB displayed, is showing you what is active in the fabric. You have successfully fallen back to the point you were at prior to beginning the changes.

As part of the procedure, you will be saving many copies of the Zone DB to the offline repository. You will want to establish a policy for cleaning up the offline repository. You should determine the number of copies to save and clean out older copies as necessary. To delete unwanted copies of the Zone DBs from the offline repository select the Zone DB you wish to delete.Zone DB Select the Fabric Zone DB to be deleted from the list, in this example RJP_120210_SANWEST_X_CURRENT (See below)

You will now see the Zone DB RJP_120210_SANWEST_X_CURRENT listed in the Zone DB field.

You can now delete this Zone DB.Zone DB Operation Select Delete from the list (See below)

You will now receive a window indicating you are removing this Zone DB from the offline repository. Respond Yes to remove it from the offline repository.

Conclusion

This document was designed provide guidance deploying IBM Network Advisor per IBM Best Practices. Additionally, guidance for maintenance, monitoring, and performance has been included. This guide is not intended to replace any of the current documentation that IBM and Brocade have released in support of this product.References

Below are links to references found in this document in addition to Network Advisor-specific links at Brocade and IBM.

Brocade Network Advisor SAN User Manual All the features, and their usage, in Network Advisor are described here

IBM Network Advisor Software Link to IBM Network Advisor overv

IBM Network Advisor Best Practices and Deployment Guide_v3.10

Documents

Transcript of IBM Network Advisor Best Practices and Deployment Guide_v3.10