Designing a Routed Fibre Channel Storage Area Network

25
STORAGE AREA NETWORK Architectural Brief: Designing a Routed Fibre Channel Storage Area Network This paper explains a new feature called Integrated Routing enabled on select 8 Gbit/sec platforms with Brocade Fabric OS 6.1.0. It provides reference architectures and guidance for designing Storage Area Network (SAN) solutions using Brocade Fibre Channel Routing (FCR) technology.

description

Designing a Routed Fibre Channel Storage Area Network

Transcript of Designing a Routed Fibre Channel Storage Area Network

Page 1: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK

Architectural Brief: Designing a Routed Fibre Channel Storage Area Network This paper explains a new feature called Integrated Routing enabled on select 8 Gbit/sec platforms with Brocade Fabric OS 6.1.0. It provides reference architectures and guidance for designing Storage Area Network (SAN) solutions using Brocade Fibre Channel Routing (FCR) technology.

Page 2: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

CONTENTS Introduction.........................................................................................................................................................................................................................................3

FC Routing Product Overview.......................................................................................................................................................................................................4 Brocade DCX and FR4-18i with Integrated Routing.....................................................................................................4 Brocade 5300 and 5100 Switches with Integrated Routing ......................................................................................5 Brocade 48000 Director with FR4-18i Blade...............................................................................................................5 Brocade 7500 Extension Switch...................................................................................................................................5

SAN Routing Overview....................................................................................................................................................................................................................6 Edge Fabrics...................................................................................................................................................................6 EX_Ports and FIDs .........................................................................................................................................................7 Exporting and LSANs .....................................................................................................................................................8 Backbone Fabrics ........................................................................................................................................................10 FC-NAT and Logical Domains ......................................................................................................................................11 Terminology Review .....................................................................................................................................................13

SAN Routing Use Cases ..............................................................................................................................................................................................................14 Small-Scale Use Cases ................................................................................................................................................14 Large-Scale use Cases ................................................................................................................................................16

Edge-to-Edge and Peer-to-Peer Design ...............................................................................................................17 Centralized Resources .........................................................................................................................................18 Multi-Site DR Solution ..........................................................................................................................................19

General Design Considerations................................................................................................................................................................................................20 HA Design.....................................................................................................................................................................20

No fabric redundancy with one FCR platform.....................................................................................................21 No fabric redundancy with two or more FCR platforms .....................................................................................21 Dual-redundant fabrics each with one FCR platform .........................................................................................21 Dual-redundant fabrics each with two or more FCR platforms..........................................................................21

Interoperability .............................................................................................................................................................22 Scalability .....................................................................................................................................................................22

General Nature of Scalability Limits....................................................................................................................22 Fault Containment................................................................................................................................................23 Fabric Services .....................................................................................................................................................23 Allowable Number of Edge Fabrics......................................................................................................................24 Per-Chassis Scalability .........................................................................................................................................24

Complex Topologies.....................................................................................................................................................25 FICON............................................................................................................................................................................25

Summary .........................................................................................................................................................................................................................................25

Designing a Routed Fibre Channel Storage Area Network 2 of 25

Page 3: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

INTRODUCTION Storage Area Networks are networks primarily intended to provide connectivity between hosts and storage devices such as disks, RAID (Redundant Array of Inexpensive Disks) arrays, and tape libraries. SANs, unlike traditional Local Area Networks (LANs), simply cannot lose data in transport. Applications and operating systems are generally tolerant of dropped packets and variable performance on the LAN, because software that relies on the LAN was built with the understanding that LANs sometime become congested and drop data. Storage was originally directly attached and for these same software components the assumptions were storage would be fast and reliable due to the use of a point to point connection and not a network.

Because of the more stringent performance and reliability requirements inherent in SANs, the industry as a whole decided more than a decade ago to standardize on Fibre Channel (FC) as a SAN transport. As a practical matter, it therefore would be more accurate to define a SAN as a high-performance, low-latency, high-reliability, and lossless network, designed primarily to enable block-level host-to-storage (initiator-to-target) connectivity, most often consisting of FC devices (switch, director, backbone), Host Bus Adaptors (HBAs), and storage devices.

As organizations continued to expand they required more comprehensive solutions than traditional FC fabrics were providing. To address these needs, Brocade® developed routing services, which increase SAN functionality, scalability, and versatility. When a set of traditional FC switches are interconnected, they form one network segment called a fabric, with a single name server space, one zoning “realm,” one Fabric Shortest Path First (FSPF) database, one PID (Port ID) format, one routing table, one set of timeout values, and so on.

An FC router enables a two-tier hierarchy of fabrics and solves many of the problems associated with non-routed Layer 2 networks. An FC router is to an FC fabric as an IP (Internet Protocol) router is to an Ethernet segment. Brocade Fibre Channel Routing (FCR) enables a hierarchical network design for Fibre Channel that allows SANs to scale, isolate, and make fabrics more manageable.

NOTE: This paper discusses topics that are treated in greater detail in the Fabric OS Administrator’s Guide for version 6.1. It is recommended that you consult that product manual when you are ready to deploy a routing solution.

Where have all the SAN islands gone? SAN islands were omnipresent during the early years of Fibre Channel. It was simply that FC networks were just starting out and they tended to be small in size. An enterprise working to deploy a new application would purchase the equipment for that application, including the array and switches, forming a new SAN island. This occurred over and over until the enterprise was faced with the issues of having far too many assets to manage and terrible inefficiencies. This resulted in efforts to consolidate into larger arrays and directors. Consolidation continues to this day and now includes newer server consolidation technologies.

Brocade FC routing products meet deep, real-world customer needs in ways that are optimized for the demands of mission-critical storage traffic.

This paper is for IT professionals who are considering a routed SAN solution or who want to broaden their knowledge of SAN routing technologies. It covers the following topics:

• Overview of Brocade FCR products

• Discussion of the architecture of the routing service

• Specific examples of routed SAN solutions

• Design considerations

Designing a Routed Fibre Channel Storage Area Network 3 of 25

Page 4: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

FC ROUTING PRODUCT OVERVIEW Brocade offers FCR as a base feature on the Brocade 7500 Extension Switch and the Brocade FR4-18i Extension Blade, with no additional licensing required. New Brocade ASICs, which enable 8 Gbit/sec port speeds, support a new capability, Integrated Routing (IR), on all 8 Gibt/sec blades (FC8-16, FC8-32, and FC8-48) and certain 8 Gbit/sec switches with the release of Fabric OS® (FOS) 6.1. Integrated Routing offers the same capabilities as FCR, except that it is user-configurable per port on FC8 blades in the Brocade DCX Backbone and on Brocade 5300 and 5100 Switches. (The Brocade 300 and embedded 8 Gbit/sec switch modules do not offer IR.)

Brocade DCX and FR4-18i with Integrated Routing The Brocade DCX Backbone, shown in Figure 1, is a revolutionary new product offering massive performance, scalability, and the flexibility to enable mission-critical production deployments to grow and evolve both physically and virtually. The internal switching fabric supports 4 Tbit/sec (4096 Gbit/sec) aggregate bandwidth across the backplane and supports full-performance local switching. The Brocade DCX running FOS 6.1 supports up to eight FC8 hot-swappable blades, which all optionally support IR.

The Brocade FR4-18i has 16 FC ports that support U_Ports and EX_Ports, plus 2 Gigabit Ethernet (GbE) ports, each of which supports up to 8 FCIP (FC over IP) tunnels. The FC ports can be used for attachment of end devices, that is, F_Ports to N_Ports, for the connection of Inter-Switch Links (ISLs) via two E_Ports, or FC-to-FC routing for Inter-Fabric Links (IFLs) via one EX_Ports and one E_Port.

Integrated Routing is enabled with the release of Fabric OS® 6.1. The FC routing protocol on the Brocade 7500/FR4-18i (EX_Ports) and IR ports (EX_Ports) is the same on each, so they are fully interoperable when connected to the same fabrics. As of FOS 6.1, IR can be activated on up to 128 ports per Brocade DCX chassis. When there are two Brocade DCX chassis connected via Inter-Chassis Links (ICLs), a total of 256 IR ports can be available. If a Brocade FR4-18i is installed in a Brocade DCX and an IR EX_Port has been enabled, EX_Ports on the Brocade FR4-18i cannot be enabled at the same time and vice-versa. If FCR is implemented without FCIP, IR is preferred on the Brocade FR4-18i, due to the greater flexibility and port speed (8 Gbit/sec vs. 4 Gbit/sec) of IR.

NOTE: FCIP using the Brocade FR4-18i is not precluded with IR, and VEX_Ports and VE_Ports continue to be fully supported.

As of Fabric OS 6.1, up to 4 Brocade FR4-18i FC Extension Blades can be installed into the Brocade DCX. This blade turns the chassis into a fully-integrated FC switching and multiprotocol routing platform. The DCX can be fitted with other blades and supports up to 352 ports with one Brocade FR4-18i.

Figure 1. Port side of a populated Brocade DCX Backbone with Integrated Routing

Designing a Routed Fibre Channel Storage Area Network 4 of 25

Page 5: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

Brocade 5300 and 5100 Switches with Integrated Routing Starting with Brocade FOS 6.1.0, the Brocade 5300 and 5100 Switches also support IR. No additional hardware is required; but a software license key is required to activate this feature. All of the ports on the Brocade 5300 (up to 80 ports) and Brocade 5100 (up to 40 ports) can be IR enabled via user configuration.

Figure 2. Brocade 5300 (top) and Brocade 5100 (bottom), both 8 Gbit/sec with IR capabilities

Brocade 48000 Director with FR4-18i Blade The Brocade 48000 Director, shown in Figure 3, is a widely adopted, high-performance, high-reliability, and very power-efficient director. As of FOS 6.1, up to two Brocade FR4-18i blades can be integrated into the Brocade 48000. The Brocade 48000 can be fitted with other blades and supports up to 352 ports when one FR4-18i has been installed. Brocade 48000 supports 1 Tbit/sec (1,024 Gbit/sec) aggregate bandwidth across the backplane.

Figure 3. Brocade 48000 Director and FR4-18i Extension Blade and FCIP gateway

Brocade 7500 Extension Switch The Brocade 7500, shown in Figure 4, is a fixed-configuration, 16-port 4 Gbit/sec FC switch with 2 GbE ports for FCIP connectivity. All the ports have the same capabilities as the FR4-18i. In addition to being qualified to perform typical FC switching functions, the Brocade 7500 can route FC across all 16 ports and across the GbE ports over FCIP tunnels. The 16 FC ports auto-negotiate 1, 2, or 4 Gbit/sec speeds. The internal architecture is non-blocking and supports up to 128 Gbit/sec of aggregate internal bandwidth. Extended fabrics using FCR can also benefit from FC FastWrite (FC-FW), which is an acceleration technique for SCSI writes. Contact your Brocade Systems Engineer (SE) for additional FC-FW details.

Designing a Routed Fibre Channel Storage Area Network 5 of 25

Page 6: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

Figure 4. Brocade 7500 Extension Switch and FCIP gateway

SAN ROUTING OVERVIEW Brocade FCR provides connectivity between two or more fabrics without merging them. In other words, the fabrics maintain separate fabric-wide services and the switches in each fabric do not communicate with switches in other fabrics. The router allows for the creation of Logical Storage Area Network (LSAN) zones, which provide node connectivity that spans these unmerged fabrics. Because a router can connect nodes in autonomous fabrics without merging the fabrics, it offers significant advantages for change management, network management, scalability, interoperability, reliability, availability, and serviceability.

An FC router is like a switching firewall or a Layer 3 switch with hardware-enforced access control. FC routers provide selective data connectivity via LSAN zones and prevent fabric services from propagating between fabrics. Thus, disruptions to a fabric are isolated and other fabrics are not adversely impacted. Another benefit of isolating services is that the FC router feature allows each fabric to maintain independent administration. This prevents administrator errors from propagating between fabrics; it also isolates hardware and software failures.

NOTE: In the paper, the terms “FC router” and “router” are both used to describe a Brocade platform with routing enabled.

Router advantages, none of which would be true for a single large fabric, include:

• The scaling of a routed fabric has no affect on the other routed fabrics.

• If routed SANs are constructed with traditional switches, the overall network size can vastly exceed that of a conventional, non-routed fabric.

• Fabric reconfigurations do not propagate between edge fabrics.

• Faults in fabric services are contained.

• Problems caused by an errant HBA are contained.

• Interoperability between different SAN operating systems can be achieved.

• Zoning errors do not propagate.

Edge Fabrics A fabric is formed by a standalone FC switch or by FC switches that are connected directly to each other via ISLs. When fabrics are connected through the intermediary of a router, they are called edge fabrics. SAN is the term used to refer to one or more fabrics, routed or not routed, that connect a common set of nodes. Most enterprises implement a SAN with two (referred to as “dual”) redundant disk fabrics, shown in Figure 5 as Fabric 1 and Fabric 2. Each of the redundant networks could include multiple edge fabrics. Figure 5 illustrates two generic edge fabrics comprising two redundant FC routers. (In the real world, the number of edge fabrics is usually greater than two.)

Designing a Routed Fibre Channel Storage Area Network 6 of 25

Page 7: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

Figure 5. Generic routed fabric architecture

EX_Ports and FIDs Switches in a fabric interconnect using E_Ports and the resulting connection is an Inter-Switch Link. Similarly, FC routers connect to edge fabrics using EX_Ports. An EX_Port is a demarcation point that marks the end of an autonomous fabric. Such a fabric can be attached to one or more routers. An E_Port (fabric side)-to-EX_Port (router side) connection is an Inter-Fabric Link.

EX_Ports are like normal E_Ports from the perspective of the edge fabric and the protocols that run across them are typical of all Brocade E_Ports. The distinction is that EX_Ports limit what each edge fabric sees in terms of devices and services from other edge fabrics. This is accomplished by the EX_Port appearing to the edge fabric like a regular FC switch (called a Front Domain) and proxy devices that use Fibre Channel Network Address Translation (FC-NAT). Services in the edge fabric cannot extend beyond the logical domains created for the purpose of representing proxy devices.

To differentiate between edge fabrics, each EX_Port has a user-configured Fabric Identifier (FID), which specifies a unique label for the fabric to which it is attached. An edge fabric can be thought of as being the “FID x” fabric in much the same way that an IP subnet can be though of in terms of its subnet address. EX_Ports must have the same FID if they are attached to the same edge fabric or the IFLs will partition. In the example above, all EX_Ports on both routers connected to Fabric 1 have FID=1 set. There are four FID=1 EX_Ports across the two routers and four FID=2 EX_Ports.

Designing a Routed Fibre Channel Storage Area Network 7 of 25

Page 8: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

Exporting and LSANs The act of projecting a node into another edge fabric via a proxy is called “exporting.” When a host is exported from Fabric 1 to Fabric 2, it must also be exported from Fabric 2 to Fabric 1, because FC communications are always bidirectional. When an LSAN zone is created, exporting from both edge fabrics occurs. Figure 6 shows a pair of devices that have been mutually exported between the two edge fabrics.

Figure 6. Exporting devices

When a set of devices on different edge fabrics are allowed to communicate through an FC router, the resulting connectivity group is an LSAN zone. The LSAN zone entity associated with the export example in Figure 6 is illustrated in Figure 7. Many different LSAN zones can exist between edge fabrics in the same way that many zones can exist in a single fabric. Devices can be members of multiple LSAN zones and LSAN zones can overlap with traditional zoning within local fabrics as well.

Figure 7. LSAN zones

Designing a Routed Fibre Channel Storage Area Network 8 of 25

Page 9: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

Day-to-day administration of LSAN zones is performed using familiar zoning tools in each edge fabric. This allows existing tools from Brocade or third parties to work as usual, minimizing the need to retrain SAN administrators. Special LSAN zones in edge fabrics are created to indicate to the router which devices should be exported. For an initiator to form a PLOGI with a target, an LSAN zone is created on each edge fabric containing those particular devices. For nodes in two edge fabrics to communicate, there must be two LSAN zones—one on each fabric. This method of routing nodes can be extended to three, four, or more fabrics just by creating more LSAN zones in the appropriate edge fabrics.

Figure 8. LSAN zones

To an edge fabric, LSAN zones are indistinguishable from traditional zones, which is why they are compatible with previous Brocade Fabric OS and Brocade M-Enterprise OS (M-EOS) versions. There are only two distinguishing features of an LSAN zone:

• First, they must begin with the prefix “LSAN_” so that routers and administrators can recognize them.

• Second, they must contain only port World Wide Names (pWWNs) or device aliases that map to a pWWN (since pWWNs are globally unique). As a best practice, LSAN zones should contain only devices intended for inter-fabric sharing.

Designing a Routed Fibre Channel Storage Area Network 9 of 25

Page 10: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

Backbone Fabrics When multiple edge fabrics are connected or if remote fabrics are linked over long distances using FCIP, routers can incorporate a centralized backbone fabric, as shown in Figure 9 (not to be confused with the Brocade DCX Backbone, shown in Figure 1).

Figure 9. FCR backbone fabric connection to a number of edge fabrics

Backbone fabrics are used to expand scalability of connections to edge fabrics and to distribute that connectivity across a campus or further across distant geographical locations. In Figure 9, multiple edge fabrics are on the top tier, routers in the middle, and a standard fabric interconnects the routers. The backbone allows a host in Fabric 1 to be exported to Fabric 14, even when those edge fabrics are not attached to any common FC routers. Notice that routers do not use EX_Ports for router-to-backbone connections, they use standard E_Ports. Also, when a host or storage device is connected via F_Port or FL_Port interfaces directly to a router or director with an FCR blade installed, that device is in the backbone fabric.

The ability to route into and out of the backbone fabric is an important feature, particularly when using IR and the FR4-18i blade. With those products, all of the non-router ports in the chassis are part of the backbone fabric, as described in the examples below:

• First, a Brocade DCX containing six FC8-32 blades and two FR4-18i blades has 32 ports, which could connect to various edge fabrics as EX_Ports or to nodes as F_Ports, four GbE ports supporting up to 32 FCIP tunnels, four FCIP ports, and 192 traditional 8 Gbit/sec FC ports. The 192 8 Gbit/sec FC ports plus all the FR4-18i ports not configured as EX_Ports would be in the backbone.

• Second, if a Brocade DCX used IR and a certain number of ports were configured as EX_Ports, the remaining ports would be in the backbone. Using backbone-to-edge routing allows nodes to be connected to backbone ports and routed to edge fabrics. LSAN zones can be configured between these devices and edge fabrics, in the same way as if the backbone were another edge fabric.

It is also possible to build redundant parallel backbone fabrics for High Availability (HA) applications. Each router can have only one backbone attachment; however, multiple routers can each have a separate backbone. If one router were connected to different backbone switches, the two backbone fabrics would merge into a single fabric through the router.

Designing a Routed Fibre Channel Storage Area Network 10 of 25

Page 11: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

For HA deployments, it is recommended that you deploy independent dual-redundant backbone fabrics. For customers who implement dual-redundant fabrics today, there is usually little or no benefit to connecting the fabrics together, even with a router. There is an availability and reliability benefit from not doing so. Dual-redundant fabrics should remain physically isolated from each other to prevent a situation in which an unforeseen error can bring down the entire network. Enterprises with dual-redundant fabrics should implement at least two backbone fabrics, one for each of the redundant fabrics. If each fabric had resiliency, that is, two routers per fabric, there would be a total of four backbone fabrics, two for one of the redundant fabrics and two for the other.

From the point of view of devices on the backbone fabric, only one backbone fabric can be traversed. A backbone fabric can never be connected through a router to another backbone fabric to reach a final destination. This architecture, called “Multi-Hop Routing,” is not supported.

FC-NAT and Logical Domains FC-NAT automatically determines the proxy PID presented to the foreign edge fabric for the device that has been exported. This is important because it affects addressing and how an administrator will identify these devices, for example, when looking at the edge fabric’s name server.

Fibre Channel fabrics traditionally have a physical topology, in which physical ports of physical devices connect via physical cables to physical ports of other physical devices. FSPF discovers the logical topology of the switches by correlating links between those switches and storing that information in a database. All switches discover an identical version of the database. The logical topology is run through the Dijkstra algorithm to calculate the routing table from the perspective of that specific switch, producing a unique routing table. Prior to SAN routers, physical and logical topologies were identical.

The SAN router introduces a new model, in which logical domains are presented. These logical domains contain proxy devices. Logical domains are virtual independent switches created in software within the router. From the perspective of the edge fabrics the logical domains are just another switch in the fabric. They do not correlate directly to any physical entities; nonetheless, logical domains are presented to the edge fabric via protocols such as FSPF LSA (Link State Advertisements), NS (Name Server), RSCNs (Registered State Change Notifications), Management Server, etc…

A device that has been exported from one edge fabric and into another is called a “proxy device”. Proxy devices are brought online by routers when LSAN zones dictate their creation and the devices specified by the zones are online. There is no need to bring a proxy device online if the real devices are not online. End devices are notified when proxy devices are brought online or offline through normal fabric mechanisms such as RSCNs and name server registrations.

Figure 10 extends the example from Figure 6 to show the way Fabric 2 would look when viewed from Fabric 1’s point of view. Each EX_Port connected to Fabric 1 projects a Front Domain (FD) and each FD projects a route into the edge fabric to the Translation Domains (TD). Each TD represents an entire remote edge fabric. The EX_Ports for an FID will send FSPF LSA for all the TDs, which will result in calculated routes to the TDs in all the switches in the FID. In some cases, TDs can be referred to as “xlate” domains. Even if a remote edge fabric contains 30+ FC switches there will be just one TD representing the entire fabric.

All EX_Ports connected to the same FID from the same switch will receive the same domain ID for the FD, in this case router 1’s FD gets domain 1 and router 2’s FD gets domain 2. These FD’s are assigned by the principal switch located within the edge fabric using traditional means of domain ID assignment. Assigning a single domain ID to each FD with multiple EX_Ports is referred to as FD Consolidation.

The routers coordinate with each other using FCRP (FC Routing Protocol) to decide how to consistently present devices from remote edge fabrics that they have access to. In this case, since they both have access to Fabric 2, one of the routers will present that domain to the principal switch for assignment (referred to as Request Domain ID or RDI) and receives domain 5; now all of Fabric 2 is represented by domain 5. FCRP communicates between routers 1 and 2 to ensure that the domain for edge fabric 2 is

Designing a Routed Fibre Channel Storage Area Network 11 of 25

Page 12: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

consistent and that the PIDs assigned to proxy devices are consistent on both attached routers. Obviously, if the same device was represented by two different proxy devices, one in each router, determining which one to post in the name server and which one to use for zoning would be problematic. Moreover, if the router with the active entries were to go offline, communications would fail. It is essential that the domains and devices remain in synchronization.

Figure 10. Internal topology

All exported devices must be translated via FC-NAT into the respective TD. The TD domain ID becomes the first byte (domain byte) of the PID. To maximize FC-NAT address space translated devices are given FC PIDs using both the area and port bytes. The combined field is incremented by one for each proxy added to the domain. Proxy addresses look like public loop devices, even though they are really N_Ports in their source fabrics, as shown below.

Byte 1: Domain ID 0 – 239 Byte 2: Area ID 0 – 255 Byte 3: Port ID 00 or 80 (allows addressing over 256 ports per platform)

Whatever PID is assigned to a given exported device is persistent. If all power was lost in the data center causing every host, storage array, and network device (including the routers) to be simultaneously rebooted it would not cause any TD PIDs to change. Furthermore, the translation table can be saved and reloaded using the configUpload and configDownload commands. Even in the event of catastrophic failure of all SAN routers, such that they have to be replaced, the mappings can be restored. This may be important for operating systems that depend on the proxy PID remaining constant. The routers also provide commands to manually set TD PIDs, if that is required.

NOTE: A device’s real PID is nearly always different from the proxy PID. A device’s nWWN (node World Wide Name) and pWWN (port World Wide Name) are always the same and are never translated, therefore, a pWWN is a unique identifier in a SAN and a PID is not.

In Figure 10, the disk in Edge Fabric 1 might have a PID of 13-01-00 if it were attached to domain 13 and port 1. The host in Edge Fabric 2 could also potentially have a PID of 13-01-00 since the same domain and port can exist in both fabrics. When the host is projected into Edge Fabric 1 it is translated to domain 5,

Designing a Routed Fibre Channel Storage Area Network 12 of 25

Page 13: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

example PID could be 05-00-01. A Fibre Channel analyzer on Edge Fabric 1 would view the local conversation as being between 13-01-00 and 05-00-01, regardless of the real PID the host had in Edge Fabric 2. To determine which devices are involved would require looking at the pWWNs using element management on the routers or SAN management software.

Backbone fabrics present a slight variation on this model since they do not use an FD. An edge fabric attaches to one or more FD via an external EX_Port(s) and stays a separate edge fabric demarcated by EX_Port(s). Backbone fabrics communicate to the routing service through embedded ports within the router. The backbone fabric is a single entity no matter how many routers and switches are connected. Router ports connecting to the backbone are all E_Ports, meaning that all switches including the routers themselves will merge into a single fabric. Note that the edge fabrics on the EX_Port side will not merge. Because the internal FC switch portion of the routers actually become part of the backbone fabric it is not necessary to have FDs to demarcate the backbone fabric. Through the embedded router ports the backbone fabric will see a TD for every edge fabric that a mutual LSAN zone has been configured.

Terminology Review

Term Definition

Backbone Routers allow a backbone fabric to interconnect other routers for a more scalable and flexible routed SAN. Each router can have many edge fabric connections but only one backbone fabric. Routers connect to the backbone fabric via E_Ports and all N_Port and NL_ Port connections to a router are part of the backbone fabric. With IR on the Brocade DCX or a FR4-18i blade installed in a Brocade DCX or 48000, a large number of hosts and storage devices can be connected to the backbone fabric.

NOTE: Do not confuse the use of the word “backbone” in the term “backbone fabric” with the Brocade DCX Backbone platform, shown in Figure 1.

EX_Port FC routers use EX_Ports instead of E_Ports as routed interfaces and to demarcate the edge of a fabric. To connect a router to a switch in an edge fabric, connect an EX_Port to an E_Port in the edge fabric. EX_Ports limit fabric services from propagating.

FCR Fibre Channel Routing is a method of establishing host and storage communications between fabrics that remain autonomous.

FID A unique identifier for each edge fabric. Two routers connected to the same edge fabric must designate the same FID for that edge fabric.

FD A Front Domain (FD) is a logical domain created at EX_Ports. FD consolidation uses a single domain ID for all EX_Ports on the same router connected to the same FID. From the perspective of FSPF, edge fabrics connect to FD, which in turn connects to TD.

IR Integrated Routing is the same as FCR, except that the routing function has been integrated into the switching 8 Gbit/sec ASICs and is supported on the Brocade DCX Backbone and Brocade 5300 and 5100 Switches.

ISL The connection between two E_Ports is called an Inter-Switch Link.

IFL The connection between an EX_Port and an E_Port is called an Inter-Fabric Link

Edge fabric A traditional FC fabric connected to a router via an EX_Port (IFL). Edge fabrics are, for the most part, where hosts and storage are attached.

LSAN A Logical SAN is a group of correlated devices that are permitted to communicate between edge fabrics. Communications traverse only one EX_Port or VEX_Port when one of the fabrics is the backbone and exactly two EX_Ports if the end devices are both in edge fabrics.

Designing a Routed Fibre Channel Storage Area Network 13 of 25

Page 14: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

Term Definition

LSAN zone LSAN zones are used to define the devices that participate in an LSAN and to configure connectivity across routers.

PID Port ID is the FC ID assigned to every port. The PID has 3 bytes: Domain | Area | Port. Traditionally, Brocade has set Port equal to the ALPA to facilitate FCAL. When connecting an N_Port, the Domain ID is the switch identifier, Area = port number and ALPA = 0. When assigned by a TD, the Domain = the TD domain ID and the Area and ALPA bits combine to allow addressing of more than 65,000 devices per TD.

TD Translate Domain (TD) is a logical domain created behind FDs. Every edge fabric and the backbone fabric have a TD if there is a device in the edge fabric that has an LSAN zone associated with it. TDs house proxies of devices in a remote edge fabric created when exported via LSAN zones from the local edge fabric.

SAN ROUTING USE CASES This section provides specific examples of Brocade FC routing solutions.

Small-Scale Use Cases The simplest deployment case for FCR involves connecting two fabrics into a single backbone. Two fabrics are deployed for redundancy, as shown in Figure 11. Routing edge fabrics is usually not intended to solve scalability problems; it is most often used for:

• Interoperability issues

• Vendor support agreements

• Production status requiring isolation

• SAN administration issues

• Resource optimization

• Tape connectivity

Figure 11. Redundant routed fabrics

Designing a Routed Fibre Channel Storage Area Network 14 of 25

Page 15: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

In this example, a customer needed to separate the production and pre-production environments to prevent problems in the pre-production fabric (for example, caused by testing new switch firmware versions) from affecting the production systems. Nonetheless, they still needed connectivity between the environments so that they can efficiently load data from the production SAN onto their test systems, which were used to prototype enhancements to applications before being released into production.

To ensure HA, the customer implemented redundant fabrics keeping their HA model intact. Each of two routers was connected to one of the two pairs of fabrics. Since the customer only needed 40 IFLs or less per backbone, they elected to use a pair of Brocade 5100 switches with IR licenses. This design provided them with all the needed connectivity in addition to significant room for growth.

An alternative to adding more switches is to enable Integrated Routing on existing Brocade DCX, 5300, or 5100 platforms in Fabrics A1 and B2. Using Fabrics A1 and B2 puts the demarcation for each edge fabric on opposite sides to reduce the chance of a SAN-wide outage in the event of an unforeseen issue on one side or the other. This is another example of backbone/edge routing, because when IR is configured in this example, one side becomes the backbone fabric and the other side becomes the edge fabric. Each side has one of each to mitigate an overall outage. Fabric A1 is the backbone fabric and presents an EX_Port to Fabric B1, while Fabric B2 is the backbone fabric and presents an EX_Port to Fabric A2.

One common variation on this configuration involves long-distance connectivity for Disaster Recovery (DR). Many customers need to connect a pair of fabrics at the primary data center to a pair of fabrics at a remote site to perform Remote Data Replication (RDR). Figure 12 shows an example of a customer use case with Brocade 7500 routers, and Figure 13 shows a similar design using the FR4-18i blade at the primary site.

Figure 12. Disaster recovery: variation 1

The example in Figure 12 is straightforward. The fabrics at each site are edge fabrics. The backbone fabrics, router-to-router connections, are through the Wide Area Network (WAN). This explains why the WAN links are labeled as ”ISLs” instead of “IFLs.” The WAN cloud in this example could comprise various transmission technologies, including: native metro FC links using dark fiber, Dense Wavelength Division Multiplexing (DWDM), Synchronous Optimal Network (SONET), or a long-distance FCIP solution. FCIP links are merely FC ISLs encapsulated in IP, and from a FC topology standpoint, should be considered as such. Unlike common connections to a fabric, connecting to a LAN or WAN does not cause FC fabrics to merge.

Designing a Routed Fibre Channel Storage Area Network 15 of 25

Page 16: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

Figure 13. Disaster recovery: variation 2

Figure 13 re-introduces backbone-to-edge routing. This customer has two Brocade DCX chassis at the primary site and two Brocade 300 Switches at the DR facility, located 10 km away. A DWDM solution is in place between the two sites. Previously, the IT Manager had been using a service to collect backup tapes every day from the primary site and ship them to the DR site. She wanted to implement SAN-level connectivity between the sites to eliminate the physical transportation solution. There was also a requirement to provide isolation of fabric services and maintain administrative separation.

The most efficient solution is to enable IR on the Brocade DCX and connect the Brocade 300 Switches across the DWDM, as shown in Figure 13. In this configuration, each fabric at the production site becomes a backbone fabric with the DR site consisting of one edge fabric for each Brocade DCX Backbone at the primary site.

Large-Scale use Cases One of the driving forces behind developing a SAN router was to enhance scalability. Metcalfe’s Law broadly states that the value of a network grows exponentially as its size increases. The larger a SAN becomes, the greater the value it provides to an organization. On the other hand, it is axiomatic that larger SANs are more difficult and expensive to build and require more resources to manage. This section shows example deployments using FCR technology and demonstrates how routers enhance scalability.

Designing a Routed Fibre Channel Storage Area Network 16 of 25

Page 17: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

Edge-to-Edge and Peer-to-Peer Design Figure 14 shows how a large SAN built around redundant routers would look.

Figure 14. Large-scale SAN

This SAN was designed to provide connectivity between 10 pre-existing fabrics in an island consolidation solution. One of the fabrics (FID=5) was quite large, while the others were relatively small or medium in size. The sum total of the domains and ports in the 10 fabrics exceeded any single fabric supported scalability limits and was operationally impractical to manage; nevertheless, the enterprise needed node connectivity between all fabrics.

Previous methods for handling connectivity between fabrics were to physically connect hosts or storage devices only during the times when inter-fabric connectivity was required. It was not practical to merge fabrics, even on a temporary basis, because of the significant risk of an outage. Installing FC routers permitted a “wire once” approach to SAN implementation. When data needed to be accessed across different fabrics, it was no longer necessary to rewire the SAN, just add LSAN zones. To “disconnect” inter-fabric communications requires only removing those LSAN zones.

This example shows how a very high degree of scalability can be achieved with even a relatively small number of routers. It also illustrates the scalability impact of routing on each fabric involved. Fabric 5, for instance, was a large and complex fabric to begin with. If a host from Fabric 9 were to be exported into

Designing a Routed Fibre Channel Storage Area Network 17 of 25

Page 18: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

Fabric 5, that would be a simple task compared to physical connectivity, which has all the complications and instabilities associated with expanding the fabric even more. Even if every other edge fabric were as large and complex as Fabric 5, only a very small percent of devices would be seen by Fabric 5.

Fabric 5 does not need to know about events happening in other fabrics, such as zoning changes, FSPF topology changes, and Name Server database updates. The router insulates edge fabric services from each other. If this network had been created as a single large fabric, every switch would require the complete zoning database, all FSPF Link State Advertisements, and all Name Service (NS) updates. Increased services can create timing and resource issues that prevent the fabric from being stable or even from forming at all.

Centralized Resources The previous example showed a way of creating selective, peer-to-peer connectivity between edge fabrics. Many enterprises with large scale requirements need such connectivity in order to connect many devices into one centralized SAN. Figure 15 shows such a case.

Figure 15. Large-scale tape consolidation

This design is similar to the Figure 13 variation in that it uses backbone/edge routing. The centralized fabric uses IR to attach all of the organization’s tape devices, while the four edge fabrics contain the hosts and storage arrays sorted by division or application group. The backbone becomes the tape fabric. Notice the EX_Port connection methodology. On the Brocade DCX deployed for tape, 16 EX_Ports on blades can be configured and connected to 16 E_Ports on blades on each of the applications chassis. Each edge fabric is connected to the backbone with 4 IFLs, 2 to each of two 8 Gbit/sec port blades. Those connections are made to different blades on both sides to ensure that a single blade failure cannot eliminate all connectivity.

Designing a Routed Fibre Channel Storage Area Network 18 of 25

Page 19: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

Notice that the IFLs between each pair of blades use adjacent ports. If four IFLs are used, then it is wise to have two groups of two. This follows Brocade ISL attachment best practices, which are:

• Spread connections between at least two blades for high availability. Two groups are used to provide redundancy; in addition, DPS (Dynamic Path Selection) is performed across the two groups.

• Keep any additional ISLs co-located within the 8-port group to allow the use of optional ISL Trunking.

Brocade ISL Trunking creates a single logical link from 2 to 8 physical connections. If more IFLs were needed in the future, it would be beneficial to reserve ports for them in advance such that they can be kept within the same port groups. This aggregates more bandwidth for every IFL connected. In the past depending on the number of FR4-18i blades, there were a limited number of EX_Ports (16 per blade) and port groups (2 per blade). This made it difficult to spread the IFLs across many port groups to provide available ports in the same group for future growth. IR enables a 1 port group for every 8 ports in a Brocade DCX FC8 blade to be used for this type of application.

If the existing directors are Brocade 48000 Directors, Brocade FR4-18i blades will still have to be used. In this case, the 16 IFLs connect using the same methodology as described above, except that the IFLs are confined to the EX_Ports on the Brocade FR4-18i.

Of course, while this design is intended to allow access to the central tape fabric, it does not preclude connectivity from edge fabric to edge fabric should that be needed in the future.

Multi-Site DR Solution This example is similar to the case shown in Figure 13, except over a Metropolitan Area Network (MAN) to a remote DR site.

Figure 16. Multi-site DR solution with FCR blades

Designing a Routed Fibre Channel Storage Area Network 19 of 25

Page 20: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

The customer in this example had six offices in a metro area that needed SAN-level connectivity to a DR site 50 km away. The diagram represents one fabric; the other fabric is identical. At their headquarters, they had over 750 host and storage ports connected to a pair of Brocade DCX Backbones connected by ICLs. The manufacturing facility had a single 384-port Brocade DCX Backbone. Each of four sales offices had individual Brocade 5100 Switches. The design they selected was to install a Brocade DCX chassis at the DR site. A Brocade Extended Fabrics license is used at each site to ensure full-speed connectivity to the DR site across the MAN. This design creates 6 edge fabrics (FID 1 through 6) and one backbone fabric, which is the DR site fabric itself.

This is similar to the example in Figure 14, in that it connects many edge fabrics to a set of centralized resources, specifically those at the DR site. It is possible to configure an LSAN from a sales office to headquarters, but the traffic would need to flow through the DR site. If during the planning phase it appeared likely that on a regular basis traffic patterns would flow from the satellite offices to headquarters or vice versa, it would be more efficient to locate the backbone at headquarters and eliminate the intermediate hop through the DR site. This assumes that the MAN connections could be made directly from headquarters to the sales offices.

The Brocade DCX Backbone at the DR site uses IR EX_Ports to form the connections to the remote edge fabrics. The same best practices described in the previous example can also be used here; however, additional consideration must be made for Brocade ISL Trunking in this scenario. Brocade ISL Trunking requires that the distances of all the IFLs in the group be essentially the same. When DWDM is using the same fiber to transmit multiple IFLs, each on its own lambda, this is not a problem because inherently all the IFLs are the same length. Keep in mind, however, that the Brocade trunk groups are not used for redundancy as much as the deployment of multiple Brocade trunk groups. Think of each Brocade trunk group as a single logical ISL or IFL. When two Brocade trunk groups are used, each trunk group should use a different fiber run or operate over a different span of the DWDM ring.

GENERAL DESIGN CONSIDERATIONS It is useful to look at specific case studies as in the previous section; however, in most practical situations the design problem is at least somewhat different. Therefore, it is necessary for a designer to be aware of the reasoning behind each design, so that they can make intelligent decisions and changes when necessary. Many aspects of routed SAN design techniques and best practices are intuitive to SAN designers and engineers already. Some aspects may not be intuitive and, unless these subjects are explicitly discussed, results may be sub-optimal or even dysfunctional. Many organizations cannot tolerate unavailable systems for any reason, especially due to a preventable storage outage. This section goes into some of the less-intuitive aspects of routed SAN design in an effort to help designers avoid those problem areas.

HA Design High Availability network designs are always recommended. SAN downtime can bring mission-critical systems offline resulting in huge loses for an enterprise. A rule of thumb is that if an organization served by the SAN is adversely affected when attached devices suddenly stop working, beyond the point of the expense to deploy and manage a second redundant fabric, then that SAN should be made redundant. The greater the degree of impact of such an event, the greater the degree of redundancy that should be used. This leads to an important SAN design principle: Always match the redundancy model used to the criticality of the systems it protects. In other words, the more important the system, the more robust the redundancy.

Designing a Routed Fibre Channel Storage Area Network 20 of 25

Page 21: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

There are four levels of redundancy in SAN design:

1. Single non-resilient fabric

2. Single resilient fabric

3. Redundant but non-resilient fabrics

4. Redundant and resilient fabrics

In the context of FCR solutions, these would be characterized as described in the following sections.

No fabric redundancy with one FCR platform The platform might have multiple IFLs, but the platform itself is still a Single Point of Failure (SPoF). If the router itself failed then inter-fabric connectivity is lost. Administrative activities also subject the network to disruptive downtime when hardware upgrades or certain software operations are performed.

No fabric redundancy with two or more FCR platforms The failure of an FCR platform would not cause an outage, nor would potentially disruptive operations, such as hardware upgrades. However, the entire non-redundant network is subject to failure and is not considered an HA design. For example, if each host has only one HBA connection and an edge fabric is down, then all hosts connected to that fabric will also be down.

Dual-redundant fabrics each with one FCR platform The hosts have multipathing software installed to handle failover and load balancing between the dual-redundant fabrics. This is an HA solution and is probably the most popular deployment model for FC routers. The risk here is that communications between edge fabrics for one of the two redundant fabrics may cease if an FCR platform fails. During periods of failure, the remaining fabric is left non-redundant. For some enterprises this may incur too much risk, depending on the applications crossing the router.

Dual-redundant fabrics each with two or more FCR platforms This is the best approach from an HA viewpoint. If an FCR platform fails, the dual-redundant fabrics can continue to function, preventing the excessive risk of operating on a single non-redundant fabric. While this design has the greatest availability, it also costs the most to implement.

HA SAN design starts with dual-redundant fabrics and is an industry-wide best-practice. Two physically separate and independent fabrics are used. Hosts and storage devices have connections to each of these fabrics, which will continue to operate even in the event of a catastrophic failure affecting one entire fabric.

HA design also provides protection against an outage when certain kinds of upgrades or other maintenance activities are performed. Brocade Fabric OS features non-disruptive firmware upgrades and downgrades, though changing some fabric configuration parameters requires disabling the platform. Scheduled downtime is as detrimental to application availability as unplanned downtime; the application is still not available and in a 24x7 data center, that is a problem. The difference is in the timing—users expect planned downtime and can plan for it. They do not expect unplanned downtime. Nevertheless production stops and Service Level Agreements (SLAs) are in jeopardy. Upgrades on the Brocade 7500/FR4-18i interrupt FCIP service for a short period of time, but Fibre Channel traffic, including routed traffic, is not disrupted. (FCIP links are ISLs extended over IP, extending edge fabrics or backbones.)

The Brocade 7500, 5300 and 5100 are all standalone platforms, not fully hardware-redundant platforms, implying that a single failure may disrupt the entire unit, for example, due to an issue with the main PCB. For these devices, redundancy is provided by having multiple units in place. The bottom line is this: there will always be some administrative operations that result in service interruption, even on an enterprise-class platform. The best way to ensure availability is to use redundant platforms. If maintaining the connection

Designing a Routed Fibre Channel Storage Area Network 21 of 25

Page 22: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

between fabrics is considered important to operations, each site should have at least two distinct FCR platforms connected to each dual-redundant fabric.

In general, the redundancy and resiliency designed into each level is in accordance with budget constraints. Just because redundant power supplies in switches could be eliminated to save money does not imply that they should be eliminated. The same applies to deploying redundant FCR platforms. The decision about how much redundancy to build into each tier of the network comes down to site-specific tradeoffs among price, performance, and availability.

Interoperability All EX_Ports, either IR or Brocade 7500/FR4-18i, support interoperability to Brocade FOS products in native mode, as well as classic Brocade M-EOS products and classic McDATA EOS products in both McDATA native fabric mode and open fabric mode. The scalability limits of Brocade M-EOS edge fabrics are the same as if the M-EOS fabric were a standalone fabric: 31 domains and 2,048 devices. This is true for both the M-EOS fabric modes.

M-EOS fabrics cannot act as the backbone or be part of the backbone fabric. To route nodes between an M-EOS fabric and FOS fabrics requires the use of an EX_Port, which means that the M-EOS fabric would have to be an edge fabric.

NOTE: Interoperability with Cisco MDS FC platforms is not supported.

Scalability Scalability can mean quite a few different things in the context of storage networking. For example, it can be a measure of how much data a particular RAID enclosure accommodates. In the context of SAN design, it usually refers to how many ports a network model can support without needing to be fundamentally restructured. Pertinent to FCR scalability are these considerations:

• How many network devices, such as switches and routers, can be deployed in a fabric?

• How many separate edge fabrics can communicate across the routers?

• How many devices can be exported at any given time?

• How large can a backbone fabric be?

All router platforms, including the 8 Gbit/sec port blades for the Brocade DCX, the Brocade 7500/FR4-18i, and Brocade 5300 and 5100 Switches process FCR in hardware and support a large number of proxy devices. Inherent in the nature of the technology is scalability support.

This section discusses how some of those scalability limitations apply to routed SAN design. There are definite scalability limits specific to various aspects of FCR, which may change as Fabric OS versions mature. (See Fabric OS Release Notes for up-to-date scalability metrics.)

General Nature of Scalability Limits All networks have scalability limitations. Even the IPv4 protocol, which forms the basis for the Internet, has reached its limit and is starting to be replaced by the IPv6 protocol, which also has limits albeit much higher ones. The limitations of a network, generally, can be classified into categories such as manageability, fault containment, vendor supported, network services, and the protocol addressing structure.

The protocol limits of FC are much higher than any network that can be built. Limiting factors of a FC network are not centered around the FC protocol as much as around the fabric services that run on the network. Databases supporting zoning, FSPF, and name services that are synchronized across all switches in a fabric are limiting, as well as the computing power to process those services on each switch. Larger FC networks take longer to converge, and as the number of switches increases the opportunity for instability

Designing a Routed Fibre Channel Storage Area Network 22 of 25

Page 23: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

increases; therefore, there is a reason to keep the size of FC networks to a minimum. Each OEM vendor has established its own rules of scalability (vendor support matrices are beyond the scope of this document).

Fault Containment The networking industry as a whole realized years ago that Ethernet segments had a practical scalability limit related to broadcast domains and fault isolation. Even if it were possible to create a single Ethernet segment with thousands of devices, it would not be a good idea to do so. Inevitably, no matter how diligent the network engineering and management teams were, eventually something unanticipated would go wrong taking down a segment. A malfunctioning Network Interface Card (NIC) resulting in a broadcast storm was beyond their control, and the best defense was to mitigate the number of affected users by way of containment. Each broadcast that arrives at a host interrupts its CPU and thousands would incapacitate a host. The more hosts on a segment, the bigger the disruption and the likely chance that more hosts will participate in the broadcast storm.

To address this issue, network engineers began limiting the size of segments in order to limit the scope of problems on a segment. The industry eventually settled on a preferred Ethernet segment size of one class C IP subnet, about 250 devices, and treated that as a scalability limit for Ethernet. Of course, this implies that multiple segments have to be connected together somehow, because more than 250 devices exist in an organization. To strike a balance between providing connectivity and limiting the scope of faults, engineers created a hierarchical networking model based on routers. This approach is now available for storage networks as well, by connecting fabrics together via FC routers.

An analogy is frequently made between broadcast storms and fabric reconfigurations. With large flat fabrics, certain kinds of reconfigurations can create stability issues in much the same way that certain kinds of broadcast storms can destabilize an Ethernet segment. Larger fabrics are proportionally more likely to experience convergence timing issues during reconfigurations, just as larger Ethernet segments are more likely to experience broadcast storms—and in both cases this creates a scalability constraint. In a Fibre Channel SAN, a fabric reconfiguration in one edge fabric is not propagated to another through a router, which is analogous to the way that an IP router contains broadcast storms.

FC fabrics have advanced software running on their switches, permitting multi-thousand port Layer 2 fabrics to exist without fault. However, for customers pushing the limits of fabric scalability without routing, fabric convergence time must be considered and can be an issue. This issue simply does not exist between smaller routed edge fabrics.

Fabric Services Considering that the edge fabric terminates at the logical domains in the router, the zoning database, name services, and fabric topology data is not propagated to other edge fabrics. Name Server (NS) entries cross edge fabric boundaries for devices that have been explicitly exported via LSAN zones in the form of a proxy. NS databases can grow only so large due to limited resources in the switch. Exported devices from a native edge fabric form a proxy device in a foreign edge fabric and consume an entry in the foreign fabric’s NS, the same as any other attached host or array port in that fabric. Each virtualized device discovered by N_Port ID Virtualization (NPIV) also consumes an entry in the NS. This indicates that adding proxy devices to an already large edge fabric can push the NS to its limits.

For example, if one fabric has a scalability limit of 3,000 NS entries and currently has 2,500 devices and so does another fabric in the same environment, an administrator could not merge the fabrics without going beyond each fabric’s NS capabilities. On the other hand, if two fabrics were connected via SAN routers, then up to 500 devices could be shared between each edge fabric without hitting the NS scalability limits.

Best practice is to only create LSAN zones for devices that actually need to communicate across the routers, rather than adding all devices to LSAN zones arbitrarily.

Designing a Routed Fibre Channel Storage Area Network 23 of 25

Page 24: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

Allowable Number of Edge Fabrics The processor and memory in routers are limited, and thus the number of edge fabrics a router can support is finite. To maximize scalability, a balance between individual edge fabric size and the number of edge fabrics must be established. For example, if a single data center is consolidating 5,000 ports into a single routed SAN, a designer could chose to target two 2,500-port edge fabrics, ten 500-port edge fabrics, or one hundred 50-port edge fabrics. All other things being equal, using ten 500-port fabrics would probably be the best approach, considering that each individual edge fabric would have a manageable size and the number of edge fabrics would also be manageable. This is more of a guideline than it is a rule or best practice, since there are often competing considerations with greater weightings.

Another approach for large-scale environments is to divide the solution into multiple backbones. The first step is to ensure that Fabrics A and B are not inter-connected in any way, that is, to the same set of routers. This is the correct approach from an HA standpoint and this alone doubles the overall scalability. Next, consider whether or not all the fabrics require connectivity into the backbone. If the design involves multi-port arrays that can accommodate attachment to each and every fabric, it might not be necessary to route between fabrics, just connect them all to the array. Every fabric then has access to the same set of storage volumes. Not all storage arrays have enough ports to meet this alternative approach.

In any case, this level of scalability analysis is rarely warranted. Unless the environment is going to involve more than a dozen fabrics per backbone, this will simply not come up. It actually has more applicability to manageability than scalability: It is easier to manage fabrics with hundreds of ports than managing fabrics with thousands of ports.

Per-Chassis Scalability Brocade DCX Backbone. The Brocade DCX supports up to 4 FR4-18i blades with no caveats; up to 8 FR4-18i blades can be used in a Brocade DCX as long as only the FCIP and FC-FW are being used and no routing functions. All 128 FC ports can be used as EX_Ports. For FCR-only implementations, IR is recommended over FR4-18i blades, due to the flexibility and higher port speed. IR on the Brocade DCX supports 128 EX_Ports per chassis, enabling connectivity to a large number of edge fabrics each with an adequate number of IFLs for aggregated bandwidth and redundancy. If two chassis are linked via ICL as a backbone, each can support 128 EX_Ports for a total of 256 EX_Ports. The main purpose of Brocade FR4-18i blades in the Brocade DCX is to provide FCIP capabilities.

IR EX_Ports and Brocade 7500/FR4-18i EX_Ports use identical protocols and therefore are completely interoperable when connected to the same edge and backbone fabrics. While this may be true, Brocade FR4-18i EX_Ports cannot coexist in the same chassis with enabled IR EX_Ports and vice-versa. It has to be one or the other. VEX_Ports used with FCIP are not restricted. In a Brocade DCX configured to use IR, if an FR4-18i is also configured to use VEX_Ports, that is fully supported. (VEX_Ports are simply EX_Ports encapsulated in IP.)

Brocade 5300 and 5100 Switches. The Brocade 5300 supports 80 IR EX_Ports per switch (out of a total of 80 ports). The Brocade 5100 supports 40 IR EX_Ports per switch (out of a total of 40 ports).

Brocade 48000 Director. The Brocade 48000 supports 8 Gbit/sec blades on FOS 6.1; however, the IR feature is not supported due to the 4 Gbit/sec CP4 ASICs that switch backbone traffic. The Brocade 48000 can support two FR4-18i blades; however, more than two blades fan can be used for FCIP and FC-FW if no routing functions are configured. If you need additional routing ports, the alternatives are to: deploy more FR4-18i blades across different chassis, use the Brocade 7500, or deploy the Brocade DCX, 5300, or 5100. In fact, there are a number of advantages to the latter approach, including increased scalability, higher availability (if multiple backbones are used), and often lower cost.

Designing a Routed Fibre Channel Storage Area Network 24 of 25

Page 25: Designing a Routed Fibre Channel Storage Area Network

STORAGE AREA NETWORK Architectural Brief

Designing a Routed Fibre Channel Storage Area Network 25 of 25

Complex Topologies At the highest level, only Core/Edge (CE) designs and their variations are supported by FCR. That is, there are two tiers and no more or less than two tiers, made up of the edge fabrics and one or more parallel backbone fabrics. It is not currently possible to have traffic cross more than one backbone in series from end to end or in a circular fashion, known as multi-hop routing. This design decision is related to hop-count restrictions recommended for inter-data-center, block-level communications. Many applications cannot withstand even a moderate amount of latency and still continue to perform as expected by users. Data can traverse only a limited number of ISLs and/or IFLs and still maintain acceptable performance.

It is possible to accommodate nearly all requirements using the topology offered by FCR and IR, while continuing to maintain high performance standards. The reality is that after reaching the proxy located in the router, the data must pass through either a backbone or another edge fabric or both before reaching the final destination. This can triple the number of real hops to a destination.

There can be quite a bit of variation in each edge and backbone fabric. An edge fabric or backbone does not need to be a Core/Edge design and often will not or should not be. The backbone(s) between routers in most of the examples in this paper are cascaded or meshed in some way, which is good practice. Many of the examples use single-domain fabrics and that is also best practice in environments that are small enough. Even if the existing fabrics are full meshes or arbitrarily complex partial meshes, the router can connect to them without issues. However, just because something can be done does not imply that it should be done. It is always best to use as simple a topology as possible. Remember Occam's Razor: if you can make a network smaller in radius and simpler in design, it is best to do so. (William of Occam was the medieval European philosopher who said, “The simplest solution is usually the best.”)

FICON FCR does not support FICON. The use of proxies in FCR makes it impractical for users to configure control unit addresses in the mainframe. In addition, there are other important communications that must occur under various situations that are not accommodated by FCR. For the time being, cascaded FICON fabrics have to be merged, including those that span MAN/WAN connections.

SUMMARY This paper reviewed enabling technologies for routed SAN solutions. By adding hierarchical networking to Fibre Channel, Brocade has combined the flexibility and scalability of data networks with the performance and reliability of FC fabrics. FCR brings unprecedented reliability, manageability, security, flexibility, and scalability to SANs, and makes deployments practical that were not even theoretically possible before its introduction. A number of architectures were discussed, including a backbone fabric with multiple edge fabrics and routing among those fabrics through the backbone fabric. Also included were designs that route from the backbone fabric to an edge fabric.

© 2008 Brocade Communications Systems, Inc. All Rights Reserved. 05/08 GA-AB-060-00

Brocade, Fabric OS, File Lifecycle Manager, MyView, and StorageX are registered trademarks and the Brocade B-wing symbol, DCX, and SAN Health are trademarks of Brocade Communications Systems, Inc., in the United States and/or in other countries. All other brands, products, or service names are or may be trademarks or service marks of, and are used to identify, products or services of their respective owners.

Notice: This document is for informational purposes only and does not set forth any warranty, expressed or implied, concerning any equipment, equipment feature, or service offered or to be offered by Brocade. Brocade reserves the right to make changes to this document at any time, without notice, and assumes no responsibility for its use. This informational document describes features that may not be currently available. Contact a Brocade sales office for information on feature and product availability. Export of technical data contained in this document may require an export license from the United States government.