Vehicle Operator's Handbook M 3032 - Welcome to the Washington
High Availability in Campus Network 2012-Usa-PDF-BRKCRS-3032
description
Transcript of High Availability in Campus Network 2012-Usa-PDF-BRKCRS-3032
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Advanced Enterprise Campus Design :
Resilient Campus Networks BRKCRS – 3032
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Presenter
Rahul Kachalia – CCIE #11732 (R&S and SP)
Technical Marketing Engineer
System Development Unit (SDU)
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Design Zone for Borderless Networks www.cisco.com/go/designzone/borderless
Borderless Campus CVD http://www.cisco.com/en/US/docs/solutions/Enterprise/Campus/Borderless_Campus_Network_1.0/Borderless_Campus_1.0_Design_Guide.pdf
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
What Are Your Uptime Requirements?
Campus network design is evolving in response to
multiple drivers
User Expectations: Always ON Access to communications
Industry Requirements: Financial, Healthcare, 7x24x365 Global access
Technology Requirements: Services, Applications, Communications – i.e
Unified Communications, Video
Requires a Structured ‘and’ Resilient Design
Global Enterprise
Availability
Collaboration
and Real-Time
Communication
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
0
5
10
15
20
25
30
35
40
45
50
Minimal Impact
to Voice User Hangs
Up
No impact to
Voice Phone Resets*
Se
co
nd
s o
f D
ata
Lo
ss
* The time for a phone to reset is variable and depends on the signaling protocol (SCCP
or SIP) and the state of the call (active, ringing, …)
How Does Downtime Affect Voice?
Availability Requirements for UC are more than just five 9’s
Also need to consider the subjective impact to real time communications
200ms 1 sec 5-6 sec
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
How Does Downtime Affect Voice or Video?
Network SLAs varies for traditional video conferencing versus TelePresence
Availability Requirements for high-definition TelePresence are more
stringent then UC
Metric TelePresence Traditional
Video
Conferencing Target Threshold 1
(Warning)
Threshold 2
(Call Drop)
Latency 150 ms 200 ms 400 ms 400-450 ms
Jitter 10 ms 20 ms 40 ms 30-50 ms
Loss 0.05% 0.10% 0.20% 1%
BW 2.5 - 12.6 Mbps + overhead 384 or 768 kbps
+ overhead
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Design Strategies For Network Survivability
Non-Disruptive Network and Service Availability Resiliency Goal
Resiliency Strategy Network Level
Resiliency
System Level
Resiliency
Operational
Level
Resiliency
Resiliency
Technologies ECMP
EtherChannel
UDLD
NSF/SSO
Power
Redundancy
ISSU
eFSU
GOLD
EEM
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
High Availability Campus Design Agenda
Network Level Resiliency
High Availability Design Principles
Simplified and Redundant Campus Design
Campus Routing Best Practices
System Level Resiliency
Integrated Hardware and Software Resiliency
Stateful and Non-Stop Forwarding
‒ Hitless Switching
Operational Level Resiliency
‒ Single and Multi-Chassis ISSU Upgrade
‒ Hitless NX-OS Software Upgrade
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Advantage
Highly Redundant Network Design
Redundant System and network paths on mission-critical network points
Protects network availability during major network fault event
Disadvantage
Becomes complex as it scales
Increase control and management plane
Redundant control-plane with redundant topology information
Simple Demand, Complex Design?
Advantage
Operational simplicity – Single Control-Plane between layer
Redundant Network Paths
Single chassis system redundancy
Cost-effective solution for small size network design
Disadvantage
Single point-of-failure design
Any major network fault can cause complete network outage
May not be very cost-effective design compare with dual systems
SiSi SiSi
SiSi SiSi
SiSi SiSi
SiSi
SiSi
SiSi
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Structured and Modular Designs Works Best
Optimize the interaction of the physical redundancy
with the network protocols
Provide the necessary amount of redundancy
Pick the right protocol for the requirement
Optimize the tuning of the protocol
The network looks like this so that we can map the
protocols onto the physical topology
We want to build networks that look like this
WAN Internet
SiSi SiSi SiSi SiSi SiSi SiSi
SiSiSiSi
SiSi SiSiSiSiSiSi
Redundant Switches
Redundant Supervisor
Redundant Links
Layer 2 or
Layer 3
Data Center
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
High Availability Campus Design Agenda
Network Level Resiliency
‒ High Availability Design Principles
‒ Simplified and Redundant Campus Design
‒ Campus Routing Best Practices
System Level Resiliency
‒ Integrated Hardware and Software Resiliency
‒ Stateful Switchover and Non-Stop Forwarding
‒ Hitless Switching
Operational Level Resiliency
‒ Single and Multi-Chassis ISSU Upgrade
‒ Hitless NX-OS Software Upgrade
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Access Layer Redundancy with Dual-Sup
Non-stop business communication with redundant supervisor
Distribute multiple uplinks from both supervisor for following
benefits :
‒ Improve network resource utilization
‒ Minimize control-plane disruption
‒ Improve network recovery to sub-second
‒ Maximize network level protection
Protects switching capacity, network topology and forwarding
information during supervisor switchover
Sup-1 Sup-2
SiSiSiSi
4500E
SiSi
Sup-1
Sup-2
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Flexible edge network and bandwidth expansion
Multiple built-in supervisor uplink ports for high-speed distribution-
access block.
Plan inter-distribution link capacity to handle large data re-routing
Minimize network congestion with distributed high-speed uplink
connections to aggregation system
Access Layer Redundancy with Single-Sup
SiSi
SiSi SiSi
4500E
1G Uplink
10G
SiSi
SiSi SiSi
4500E
10G Uplink
10G
SiSi
SiSi SiSi
4500E
10G Uplink
10G
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Simplified, Scalable & Reliable Access Network with
Cisco StackWise Plus
Single Management
Centralized control-plane architecture
NSF Capable
Control and Mgmt Plane
Network expansion as it grown
Several 1G link consolidation to 10G
High-speed stack-ring for intra-access traffic
Physical Network
Single point-to-point network
Distributed forwarding architecture
Reduces VLANs and subnets
Network Design
SiSi SiSiVSL
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Uplink Redundancy with Cisco StackWise Plus
SiSi SiSi
Build uplinks with two stack-member switches.
Protocol driven network recovery with dual uplinks
Quad distributed uplinks
Increase uplink capacity
Hardware driven network recovery with traditional distribution design
Prevents network topology change and improves network recovery
Dual vs Quad Uplink Design Alternatives
Dist-1 Dist-2
SW1 SW9
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
We Will Be Talking About Solutions for Two
Distribution Block Models
Traditional Distribution Block
Design
Dual Standalone System
Distributed Planes
Protocol dependent fault detection
and recovery
Evolution Network Design
Single Virtual System
Unified Control and Management
plane. Distributed Forwarding plane.
Deterministic Network Recovery.
SiSi SiSi
Vlan 10 Vlan 20 Vlan 30 Vlan 10 Vlan 20 Vlan 30
SiSi SiSi
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Traditional Distribution Design
Redundant design with sub-optimal topology and complex
operation.
Stabilize network topology with several L2 :
‒ STP Primary and Backup Root Bridge
‒ Rootguard
‒ Loopguard or Bridge Assurance
‒ STP Edge Protection
Protocol restricted forwarding topology –
‒ STP FWD/ALT/BLK Port
‒ Single Active FHRP Gateway
‒ Asymmetric forwarding
‒ Unicast Flood
Protocol dependent driven network recovery
‒ PVST/RPVST+
‒ FHRP Tunings
SiSiSiSiHSRP Active
Rootguard
Loopguard or
Bridge Assurance
Bridge
Assurance
STP Root
BPDU Guard or
PortFast
Port Security
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Even with Faster Convergence from RPVST+ We Still Have to Wait on FHRP
Convergence
GLBP offers load balancing within a VLAN
For Voice, sub-second Hello timer enables < 1 Sec traffic recovery upstream
Sub-Second protocol timers must be avoided on SSO capable network
FHRP Active FHRP Standby
SiSiSiSi
interface Vlan4
ip address 10.120.4.2 255.255.255.0
standby 1 ip 10.120.4.1
standby 1 timers msec 250 msec 750
standby 1 priority 150
standby 1 preempt
standby 1 preempt delay minimum 180
interface Vlan4
ip address 10.120.4.2 255.255.255.0
glbp 1 ip 10.120.4.1
glbp 1 timers msec 250 msec 750
glbp 1 priority 150
glbp 1 preempt
glbp 1 preempt delay minimum 180
interface Vlan4
ip address 10.120.4.1 255.255.255.0
ip helper-address 10.121.0.5
no ip redirects
vrrp 1 description Master VRRP
vrrp 1 ip 10.120.4.1
vrrp 1 timers advertise msec 250
vrrp 1 preempt delay minimum 180
HSRP Config
GLBP Config
VRRP Config
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
PIM Needs Timer Tuning Too
Multicast recovery depends on PIM DR failure detection in
Layer 2 network
PIM routers exchanges PIM expiration time in query
message –
‒Default Query-Interval – 30 seconds
‒Expiration – Query Interval x 3
‒DR Failure Detection – ~90 seconds
Tune PIM query interval to sub-sec as FHRP for faster
multicast convergence
Sub-second protocol timer must be avoided on SSO capable
network interface Vlan4
ip pim sparse-mode
ip pim query-interval 250 msec
PIM DR SiSiSiSi
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Sub-second Protocol Timers and NSF/SSO
NSF is intended to provide availability through route convergence avoidance
Fast IGP timers are intended to provide availability through fast route convergence
In an NSF environment dead timer must be greater than:
SSO recovery + Routing Protocol restart + time to send first hello
Recommendation keep protocol timers to default
Neighbor Loss,
Graceful Restart
SiSiSiSi
SiSiSiSi
NSF Restart
RP Restart
OSPF First Hello
NSF Capable
NSF-Aware
Hello
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
STP Root
BPDU Guard or
PortFast
Port Security
Rootguard
Simplify STP Network Topology with VSS
STP BLK Port
Loop-free L2 EtherChannel
Multiple parallel Layer 2 network path builds STP
loop network
VSS with MEC builds single loop-free network to
utilize all available links.
Distributed EtherChannel minimizes STP
complexities compared to standalone distribution
design
STP toolkit should be deployed to safe-guard
multilayer network
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Simplified, Scalable and Reliable L3 Gateway with VSS
Single logical Layer 3 gateway. Eliminates complete need of
implementing FHRP protocols.
Removes FHRP dependencies and increases Layer 3 network
scalability.
Hardware based rapid fault-detection and network recovery
with default protocol timers.
Deterministic network sub-second network convergence in
multiple fault conditions.
R1
Single IP Gateway
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
EtherChannel Link Convergence Optimal Fast Traffic Restoration
Catalyst Switch
Link failure detection
Removal of the Portchannel entry in the software
Update of the hardware Portchannel indices
1 Link Failure
Detection
2
1
2
3
3
Routing Protocol Process
Spanning Tree Process
Notify the spanning tree and/or routing protocol processes of path cost
change
4
4
Layer 2 Forwarding Table
Load-Balancing Hash
Destination Port
G3/1
G3/2
G4/1
G4/2
VLAN MAC Destination
Index
10 AA Portchannel 1
11 BB G5/1
PortChannel 1 G3/1, G3/2, G4/1, G4/2
SiSi SiSi
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Multi-Chassis EtherChannel Performs Better In Any
Network Design
Network Recovery mechanic varies in different distribution
design –
‒ Standalone – Protocol and Timer dependent
‒ VSS – Hardware dependent
VSS logical distribution system –
‒ Single P2P STP Topology
‒ Single Layer 3 gateway
‒ Single PIM DR system
Distributed and synchronized forwarding table –MAC address,
ARP cache, IGMP
All links are fully utilized based on Ether-channel load
balancing
0
0.2
0.4
0.6
0.8
1
L2-FHRP L2-MEC
Co
nv
erg
en
ce (
sec)
Upstream Downstream Multicast
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
timers throttle spf 10 100 5000
timers throttle lsa all 10 100 5000
timers lsa arrival 80
OSPF SPF Tuning
The Best Deployment for Standalone Is Routed Access
Simplified Operation with single control-plane – Routing Protocols
Improved Network Design – No FHRP, STP, Trunk, VTP etc.
Optimized Forwarding Topology – Layer 3 ECMP
Improved convergence with fewer protocols
EIGRP/OSPF
Layer 3
Layer 2
SiSiSiSiHSRP Active
Rootguard
Loopguard or
Bridge Assurance
Bridge Assurance
STP Root
BPDU Guard or
PortFast
Port Security
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
VSS Simplifies Routed Access
Builds single point-to-point routing peer adjacency with MEC
EtherChannel delivers deterministic network recovery
Minimizes adjusting protocol timers and parameters
EIGRP / OSPF
Single Adjacency
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
SiSi
Designated
Router
(High IP Address)
IGMP Querier
(Low IP address)
Designated
Router & IGMP
Querier
Non-DR has to
drop all non-RPF
Traffic
SiSiSiSi SiSi
Routed Access Optimized Multicast Operation
Layer 2 access has two multicast routers on the access subnet, causing one to have to discard frames
Routed Access has a single multicast router which simplifies management of multicast topology
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
VSS Optimizes Multicast Performance with Routed
Access
Single logical L3 path to RP from access to join multicast
distribution tree
Single OIL/IIL PIM interface in Multicast Routing Table
Increases multicast bandwidth capacity with all MEC
member-links programmed for switching
Transparent to network faults and provides deterministic
sub-second multicast data recovery Single PIM Join Message
Single OIL
OIL = Outgoing Interface List IIL = Incoming Interface List
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Routed Access Provides Rapid Convergence with
Optimized Traffic Flow and Ease of Mgmt
CEF and protocol based network recovery in Standalone Routed Access Design ‒ EIGRP converges in <200 msec
‒ OSPF with sub-second tuning converges in <200 msec
‒ Multicast with sub-second tuning convergences in ~600 msec
EtherChannel hash based network recovery in VSS Routed Access Design ‒ Deterministic sub-second unicast & multicast network
convergence
EtherChannel does not require any further protocol tunings
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
EIGRP-ECMP EIGRP-MEC OSPF-ECMP OSPF-MEC
Co
nve
rge
nce
(se
c)
Upstream Downstream Multicast
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Intra-Chassis Recovery
SiSi SiSi
Inter-Chassis Recovery
Diversify Links For Module Redundancy
Distribute multiple connections to single or logical remote
system between different linecard module when possible.
Recovery mechanic same as link failure.
Prevents topology changes or forwarding updates and provides
intra-chassis sub-second recovery.
Depending network load it minimize the network congestion
SiSi SiSi
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Best Practice for Module OIR
Module OIR is supported on all modular systems.
Network recovery have higher impact with Module OIR due to
‒ OIR detection
‒ Hardware Synchronization
‒ Protocol Dependencies
‒ Forwarding Updates
Minimize network impact with following techniques :
‒ Admin Power Down
‒ Admin Reset
0
0.5
1
1.5
2
2.5
OIR Power Down Soft Reset
Co
nve
rge
nce
(se
c)
Upstream Downstream Multicast
6500E(config)#no power enable module <slot-id>
6500 Standalone
6500-VSS(config)no power enable switch <1|2> module <slot-id>
6500 VSS
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
High Availability Campus Design Agenda
Network Level Resiliency
‒ High Availability Design Principles
‒ Simplified and Redundant Campus Design
‒ Campus Routing Best Practices
System Level Resiliency
‒ Integrated Hardware and Software Resiliency
‒ Stateful Switchover and Non-Stop Forwarding
‒ Hitless Switching
Operational Level Resiliency
‒ Single and Multi-Chassis ISSU Upgrade
‒ Hitless NX-OS Software Upgrade
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Core Layer Routing Design Strategy
Design Campus Core with Simplicity
Optimize Routing Topologies:
Hide Topology – EtherChannel
Hide Reachability – Route Summarization
Filter – Stub, Distribute-list, Route-Maps
High-Performance, Reliable Network Design
Increase Application Performance
Deterministic Network Recovery
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Data Center WAN Internet
VSS Enabled Campus Design End-to-End VSS Design Option
Data Center WAN Internet
SiSi SiSi SiSi SiSi
SiSiSiSi
SiSi SiSiSiSiSiSi
SiSi SiSi
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Scalable and Hitless Core Design Alternative with
Nexus 7000
Standalone Redundant Core System
High-scale, High-Performance system.
Hitless forwarding design with distributed forwarding architecture by de-coupling centralized control and management.
Highly Available – Hitless Forwarding, NSF/SSO, EC, ISSU etc.
Data Center WAN Internet
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Deploy EtherChannel for Simplify, Optimize and
Reliable Core
Single Unified Core System
Single point-to-point network per neighbor.
Simplified, Optimized and resilient Unicast and Multicast Network Design
Highly Available – VSS, Quad-Sup, NSF/SSO, MEC, eFSU etc.
6500-VSS
Standalone Redundant Core System
Single point-to-point network per neighbor.
EtherChannel ECMP to simplify, optimize and build resilient Network Design
Highly Available – Hitless Forwarding, NSF/SSO, EC, ISSU etc.
Nexus 7000
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
EIGRP Is Unique with Multi-Level Summarization
Capability
The greatest advantages of EIGRP are gained when
the network has a structured addressing plan that
allows for use of summarization and stub routers
EIGRP provides the ability to implement multiple
tiers
of summarization and route filtering
Able to maintain a deterministic convergence time in
very large L3 topology
10.10.0.0/17 10.10.128.0/17
10.10.0.0/16
SiSi SiSi SiSi SiSi
SiSi SiSi2001:DB8:10::/56 2001:DB8:10:128:/56
2001:DB8:10::/48
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
EIGRP Convergence Is Improved with Summarization,
Filtering and EtherChannel
EIGRP convergence is largely dependent on query paths and response times
Implement EtherChannel to reduce query paths
Minimize the number and time for query response to speed up convergence
Summarize distribution block routes upstream to the core
Configure all L3 access switches as EIGRP stub routers
router eigrp 100
network 10.0.0.0
eigrp stub connected
!
interface TenGigabitEthernet 4/1
ip summary-address eigrp 100 10.120.0.0 255.255.0.0
Query Response
SiSiSiSi
SiSiSiSi
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Avoid Default Route Black Hole
Know default route source in the network.
EIGRP advertises default-route if exists in Routing Table.
Maintain network availability in campus by advertising following routes to EIGRP Stub routers
‒ Summarized Internal Route
‒ Default-Route to Stub routers
WAN Internet
10.1.0.0/16 10.2.0.0/16 10.3.0.0/16
10.4.0.0/16 10.5.0.0/16
router eigrp 100
network 10.0.0.0
distribute-list EIGRP_STUB_Routes out <Port-Channel#>
!
ip access-list standard EIGRP_STUB_Routes
permit 10.0.0.0
permit 0.0.0.0
! Data Center
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
OSPF Area Boundaries Offer Summarization for
Improved Scale
Area boundaries provide buffers between fault
domains
Keep area 0 for core infrastructure
Do not extend area 0 to the access routers when
using Routed Access
WAN Internet
SiSi SiSi
SiSiSiSi
SiSi SiSiSiSiSiSi
Area 100 Area 110 Area 120
Area 0
SiSi SiSi SiSi SiSi
Data Center
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
SiSiSiSi
SiSiSiSi
OSPF Downstream Summarization Is Accomplished
with Multiple Area Types
ABR for a regular area forwards
Summary LSAs (Type 3)
ASBR summary (Type 4)
Specific externals (Type 5)
Stub area ABR forwards
Summary LSAs (Type 3)
Summary default (0.0.0.0 - ::/0)
A totally stubby area ABR forwards
Summary default (0.0.0.0 - ::/0)
router ospf 100
area 120 stub no-summary
network 10.120.0.0 0.0.255.255 area 120
network 10.122.0.0 0.0.255.255 area 0
OSPF
Area
120
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
SiSiSiSi
SiSiSiSi
ABR’s originate Summary 10.120.0.0/16 &
2001:DB8:10:120::/48
OSPF Upstream Summarization Helps Minimize LSA
Churn in the Core
Summarize routes from the distribution block upstream into the core
Minimize the number of LSA’s and routes in the core
Reduce the need for SPF calculations due to internal distribution block
changes
router ospf 100
area 120 stub no-summary
area 120 range 10.120.0.0 255.255.0.0 cost 10
network 10.120.0.0 0.0.255.255 area 120
network 10.122.0.0 0.0.255.255 area 0
interface Vlan120
ip address 10.120.0.1 255.255.255.192
!
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Cost = 3
OSPF Cost Matters in EtherChannel Designs
Route metrics (bandwidth) automatically adjusted on EtherChannel
interface
Maximum bandwidth or cost computation differs between OS –
‒ IOS – 10G (default) *
‒ NX-OS – 40G (default) **
Single core-layer member-link failure in OSPF – EC/MEC design
may
‒ Under-utilize Network Resources
‒ Build Asymmetric Forwarding Topology
‒ Increases Network Convergence Time
* Adjustable. Recommended to keep default
Cost = 1 Cost = 1
Cost = 3
SiSi SiSi
Summary Net 10.100.0.0/16
Auto-Cost = 10G
Auto-Cost = 10G
Cost = 5
SiSi SiSi
Auto-Cost = 40G
Auto-Cost = 40G
Summary Net 10.100.0.0/16
Cost = 1
Cost = 3
Cost = 1
Cost = 3
** Recommended to adjust OSPF auto-cost ref. bw to 10G on Nexus 7000
N7K-Core(config-router)#auto-cost reference-bandwidth 10000
Nexus 7000
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Optimize EtherChannel Load Balancing
SiSi
Default : src-mac
Recommended : src-dst-ip
Default : src-dst-ip vlan
Recommended : src-dst-mixed-ip-port vlan
Default : src-dst-ip
Recommended : src-dst ip-l4port-vlan Default : src-dst-ip vlan
Recommended : src-dst-mixed-ip-port
Default : src-dst-ip vlan
Recommended : src-dst-mixed-ip-port vlan
Load share egress data traffic based on input hash
Optimal load sharing results with :
‒ Bucket-based load-sharing – Bundle member-links in power-of-2 (2/4/8)
‒ Multiple variation of input for hash (L2 to L4)
Recommended algorithm * :
‒ Access – Src/Dst IP
‒ Dist/Core – Src/Dst IP + Src/Dst L4 Ports
* May vary based on your network traffic pattern
Access
Dist
Core
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Layer 3 Load Balancing Can Be Randomized with a Unique ID Associated
with Switch
“Universal ID” concept (also called Unique ID) is used to prevent CEF polarization
Universal ID generated at bootup (32-bit pseudo-random value seeded by router’s base IP address)
Universal ID used as input to ECMP hash, introduces variability of hash result at each network layer
Universal ID supported on Catalyst 6500 Sup-32 and Sup-720
Universal ID supported on Catalyst 4500 SupII+10GE, SupV-10GE and Sup6E
Hash using
Source IP (SIP),
Destination IP (DIP)
&Universal ID
Original Src IP + Dst IP
Universal* Src IP + Dst IP + Unique ID
Include Port Src IP + Dst IP + (Src or Dst Port) + Unique ID
Default* Src IP + Dst IP + Unique ID
Full Src IP + Dst IP + Src Port + Dst Port
Full Exclude Port Src IP + Dst IP + (Src or Dst Port)
Simple Src IP + Dst IP
Full Simple Src IP + Dst IP + Src Port + Dst Port
Catalyst 4500 Load-Sharing Options Catalyst 6500 PFC3** Load-Sharing Options
* = default load-sharing mode
SiSi SiSi
SiSi SiSi
SiSi
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Simple Network Design Delivers Deterministic Network
Recovery
Routing Protocol Independent network convergence
ECMP Prefix-Independent Convergence (PIC) for with 6500 (VSS/Standalone) from 12.2(33)SXI2
Hardware-based fault detection and recovery in MEC/EC designs
Number or Unicast Routes Core/Distribution – Sup720-10GE
Time for ECMP/MEC Unicast Recovery
0
0.5
1
1.5
2
2.5
3
3.5
500 1000 5000 10000 15000 20000 25000
Co
nve
rgen
ce
(s
ec
)
ECMP (W/o PIC) ECMP (With PIC) MEC
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
VSS Core Simplifies Multicast Operation, Improve
Performance and Redundancy
Standalone Core needs AnyCast MSDP peering for RP Redundancy.
VSS based Core simplifies PIM RP Redundancy with NSF/SSO/MMLS technologies.
ECMP builds single Multicast forwarding path.
MEC increases multicast forwarding capacity by utilizing all member-links.
Single Logical PIM RP
Single Logical PIM Interface
Dist Single Logical PIM Router
PIM Join
Single Logical OIL
Multiple Multicast Forwarding Paths
Core
SiSi SiSi
PIM RP
Core
PIM RP
SiSi SiSi
PIM Router Dist
PIM Router
AnyCast - MSDP
PIM Join
Single OIL
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Simplified Multicast Network Design Delivers
Deterministic Network Recovery
ECMP multicast recovery is mroute scale dependent could range
in seconds.
MEC/EC multicast recovery is hardware-based and recovery is
scale-independent in sub-seconds
0
1
2
3
4
5
6
100 500 1000 5000
Co
nve
rge
nc
e (
se
c)
ECMP
MEC/EC
Number or Multicast Routes Core/Distribution – Sup720-10GE
Time for ECMP/MEC Multicast Recovery
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
High Availability Campus Design Agenda
Network Level Resiliency
‒ High Availability Design Principles
‒ Simplified and Redundant Campus Design
‒ Campus Routing Best Practices
System Level Resiliency
‒ Integrated Hardware and Software Resiliency
‒ Stateful Switchover and Non-Stop Forwarding
‒ Hitless Switching
Operational Level Resiliency
‒ Single and Multi-Chassis ISSU Upgrade
‒ Hitless NX-OS Software Upgrade
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Does I still need Dual Supervisor?
Redundant physical paths Protects Network Availability
Converges in sub-second
May not maintain capacity and performance.
Increases outage probability during major node failure
Redundant Supervisor Module Protects Network and Services Availability
Maintains capacity and performance
System remains in-service during major supervisor failure
Hitless to insignificant data loss during switchover
SiSi
Single Point of Failure
Reduced Capacity
Self Recovery Fail
Reduced Capacity
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Supervisor Redundancy Provides Stateful Switch Over
1:1 Supervisor Redundancy Architecture
Stateful Synchronization
‒ System Variables
‒ Configuration – Running/Startup
‒ Layer 2/3 Protocol State and Topologies
‒ Policies – ACLs, QoS etc.
‒ Linecards Status
Active Supervisor owns control-plane ownership.
Develops central and distributed forwarding table
Graceful system recovery by protecting hardware and
software state-machines
Architecture varies between modular systems
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
NSF Works with SSO to Keep Neighbors Forwarding During
a Supervisor Switchover
Non-Stop Forwarding provides graceful restart enhancements to
EIGRP, OSPF, IS-IS, BGP and LDP
An NSF-capable router continuously forwards packets during an
SSO processor recovery
NSF-aware and NSF-capable routers provide for transparent
routing protocol recovery
Graceful restart extensions enable neighbor recovery without resetting
adjacencies
Routing database re-synchronization occurs in the background NSF-Aware,
NSF-Capable
SiSiSiSiNSF-Aware
NSF-Aware,
NSF-Capable
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Cisco vs IETF OSPF NSF Capability
NSF Capable NSF Aware NSF Capable NSF Aware
Restart event
Fast Hello (2 sec interval RS bit set)
Fast Hello (2 sec interval RS bit clear)
Database Description Database
Description
LSA Requests/Update
Hello (RS bit clear) Hello
(RS bit clear)
Fast H
ello
LSA Requests/Update
Ou
t-of-B
an
d S
yn
c
Restart event
LS Update (Grace LSA) LS ACK
(Grace LSA)
Hello
An
no
un
ce
Gra
ce
ful-
res
tart
Hello
Fast Hello (2 sec interval RS bit set)
Fast Hello (2 sec interval RS bit clear)
Database Description Database
Description
LSA Requests/Update
Hello
Hello
LSA Requests/Update
Data
base E
xch
an
ge
OS
PF
Dis
co
ve
ry
225.0.0.5
225.0.0.5
Recommendation When peering with IETF capable device, use IETF NSF Capability using “nsf ietf” command under routing process”
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
High Availability Campus Design Agenda
Network Level Resiliency
‒ High Availability Design Principles
‒ Simplified and Redundant Campus Design
‒ Campus Routing Best Practices
System Level Resiliency
‒ Integrated Hardware and Software Resiliency
‒ Stateful and Non-Stop Forwarding
‒ Hitless Switching
Operational Level Resiliency
‒ Single and Multi-Chassis ISSU Upgrade
‒ Hitless NX-OS Software Upgrade
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Mix Processing Architecture
‒Centralized (Master) – CDP, LACP, Layer 3 (ARP/Routing/Multicast) and Management
Plane.
‒Distributed (All stack-members) – MAC Learning, STP, QoS, ACL etc.
Distributed Forwarding Architecture
‒Single Forwarding Table – Master synchronizes the RIB/FIB with all stack-member
switches
‒Local-switching – Within port-asic and between port-asics thru local switch-fabric
1:N Master Switch Redundancy in stack-ring. Dynamic re-
election after failure
Protects distributed L2/L3 FIB. Gracefully restarts routing
adjacencies
StackWise+ Provides Stack-Ring Redundancy
Master
Distributed FIB
Master
SiSiSiSi
VSL
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Designating Master/Slave Switch in Stack-Ring
Any stack-member can become Master. Recommended to
increase switch priority for deterministic role.
Master switch failure-detection, propagation and re-election could
range in 2-3 seconds.
Network recovery mechanic differs in different designs –
‒ Master Switch with Uplink
‒ Master Switch without Uplink (Recommended)
Master (Priority=15)
Slave (Priority=14)
!Increase Master Switch Priority to 15(highest)
switch 5 priority 15
!Increase Slave Switch Priority to 14(lower than Master)
switch 6 priority 14
SiSiSiSi
VSL
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Multilayer – StackWise Plus Master Switch Recovery
Analysis Stack Design – 3 (Recommended)
Master Switch w/o Uplink
Slave Switch (w/o Uplink) set in stack-ring
0
0.5
1
1.5
2
2.5
Design - 1 Design - 2 Design - 3
Co
nve
rgen
ce
(s
ec
)
Catalyst 3750-X StackWise Plus Master Failure Analysis
Upstream Downstream
Stack Design – 1
Master Switch with Uplink
No Slave Switch (same priority)
Stack Design – 2
Master Switch with Uplink
Slave Switch (w/o Uplink) set in stack-ring
SiSiSiSi
VSL
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
NSF Recovery
Graceful Routing with StackWise Plus
Routing adjacencies and L3 FIB preserved during Master
failure.
Graceful routing capability supported for EIGRP and OSPF.
Network recovery mechanic differs in different designs –
‒ Master Switch with Uplink
‒ Master Switch without Uplink (Recommended) EIGRP / OSPF
Master
Distributed FIB
Master router eigrp 100
nsf
!
router ospf 100
nsf
SiSiSiSi
VSL
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Routed Access – StackWise Plus Master Switch
Recovery Analysis
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Design - 1 Design - 2 Design - 3
Co
nv
erg
en
ce (
sec)
Catalyst 3750-X StackWise Plus Master Failure Analysis – EIGRP Routed Access
Upstream Downstream
Stack Design – 1
Master Switch with Uplink
No Slave Switch (same priority)
Stack Design – 2
Master Switch with Uplink
Slave Switch (w/o Uplink) set in stack-ring
Stack Design – 3 (Recommended)
Master Switch w/o Uplink
Slave Switch (w/o Uplink) set in stack-ring
SiSi SiSi
VSL
EIGRP / OSPF
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Design - 1 Design - 2 Design - 3
Co
nv
erg
en
ce (
sec)
Catalyst 3750-X StackWise Plus Master Failure Analysis – OSPF Routed Access
Upstream Downstream
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
4500E SSO Architecture Protects Network Availability
and Capacity
1+1 Supervisor Redundancy Architecture
Centralized Processing Architecture Active Supervisor maintains all three-planes
In real-time hardware and software state-machine synchronization from Active to Standby Supervisor
Centralized Forwarding Engine Switch data-traffic between linecard modules
Stub-Linecards – No local-switching
Decouples Control and Forwarding Plane –
Protects Network Capacity during Soft/Admin Forced Switchover
IOS Software Upgrade
Line Card
Line Card
Line Card
Line Card
Line Card
Active Sup
Forwarding Engine FFE /VFE
Shared Memory Fabric PPE / IPP
Standby Sup
SSO Redundancy
Catalyst 4500E
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Active Standby
MultiLayer – 4500E Supervisor Switchover Analysis
0
0.02
0.04
0.06
0.08
0.1
0.12
Co
nv
erg
en
ce (
sec)
Upstream Downstream Multicast
SiSiSiSi
4500E
Standby
VSL
Active
Stateful Layer 2 Protocol Synchronization
STP, MAC Table, IGMP Snooping, PAgP etc.
Protects Network Capacity
Maintains all uplinks, including on failed Sup
All linecard module remains operational
Deterministic <100msec Convergence
Forwarding-Engine decouples control and forwarding plane
Sup Fabric Connectivity remains operational even after failure
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Active Standby
Routed Access – 4500E Supervisor Switchover
Analysis
0
0.5
1
1.5
2
2.5
Co
nv
erg
en
ce (
sec)
Upstream Downstream Multicast
4500E
Standby Active
EIGRP / OSPF
Stateful Layer 3 Protocol Synchronization
EIGRP, OSPF, ARP etc.
PIM SSO capability not supported.
Deterministic <100msec Unicast Convergence
Forwarding-Engine decouples control and forwarding plane
Sup Fabric Connectivity remains operational even after failure
SiSiSiSi
VSL
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
VSS-SW2 VSS-SW1
6500-E VSS Architecture
Intra-Chassis SSO Redundancy
Catalyst 6500-E
Line Card
Line Card
Active Sup
SF PFC RP
Internal EOBC
Standalone
External EOBC (VSL)
Line Card
Line Card
Internal EOBC
Standby Sup
SF PFC RP
Standby Sup
SF PFC RP
Inter-Chassis SSO Redundancy
Catalyst 6500-E
SF : Switch Fabric PFC : Policy Feature Card
RP : Route Processor EOBC : Ethernet Out-of-Band Channel
Internal EOBC : Internal communication control channel between supervisor and linecards within single-chassis
External EOBC : External communication control channel between supervisors between two-chassis
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Standby
VSS Dual-Sup Inter-Chassis Redundancy
VSS Dual-Sup (single per virtual-switch) supports inter-chassis
SSO redundancy.
Single in-chassis supervisor - SSO Active or Standby role.
Stateful SSO synchronization and redundancy between virtual-
switches
Single Sup System Design –
‒ Supervisor switchover requires chassis reset, including all linecard
and service modules
‒ Network capacity reduced until system returns to operational state
Reduced Capacity
Reduced Capacity
SiSi
Reduced Capacity
Reduced Capacity
NSF Recovery
Active Active Standby
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
New Active Supervisor
VSS Quad-Sup Extends HA Capability
Starting 12.2(33)SXI4 Sup720-10GE VSS supports two sup
redundancy modes :
‒ Dual-Sup – One Sup per virtual-switch
‒ Quad-Sup – Two Sup’s per virtual-switch
Dual Sup offers single redundancy option –
‒ Inter-Chassis only. Resetting Active or Standby supervisor reboots all installed
modules
‒ Sup hardware failure may increase MTTR, reduce network capacity, services
availability and may build un-reliable network
Quad Sup offers dual redundancy options –
‒ Inter-Chassis – Same design as dual-sup
‒ Intra-Chassis – Allows virtual switch to return in-service, reduce MTTR and
stabilize network from major fault
SiSi
Self Recovery Fail
Single Point of Failure
Reduced Capacity
Reduced Capacity
NSF Recovery
Sup720-10GE Quad-Sup Redundancy
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
ICS – RPR-WARM ICS – RPR-WARM
VSS Quad Sup Supports Dual HA Mode
VSL
SiSiSiSiSiSi SiSiSiSiSiSi
Inter-Chassis Sup Redundancy
SW1 SW2
Intra-Chassis Sup Redundancy
Intra-Chassis Sup Redundancy
Dual in-chassis supervisors, each in different redundancy modes –
In-chassis Active Supervisor (ICA) – In SSO Active OR Standby Mode
In-chassis Standby Supervisor (ICS) – RPR-WARM Mode
Stateful SSO synchronization from SSO Active to Standby supervisor
System configuration synchronization between ICA and ICS supervisors
Chassis reset when ICA supervisor reset
ICA – SSO Active ICA – SSO Standby
Sup720-10GE Quad-Sup Redundancy
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
VSS Quad Sup RPR-WARM Design
Provides system redundancy during major ICA failure.
RPR-WARM – Sup in hybrid operational mode :
‒ ICS Supervisor – RPR cold-state with extended capabilities
‒ DFC Linecard – Distributed linecard with all available 1G/10G
uplink ports for network connectivity.
ICS synchronizes various configuration from ICA :
‒ Startup-Configuration
‒ VLAN Database
‒ Boot Variable
‒ VSS Virtual-Switch ID
ICS – RPR-WARM
ICS – RPR-WARM
VSL
SiSiSiSiSiSi SiSiSiSiSiSi
SW1 SW2
ICA – SSO Active
ICA – SSO Standby
6500#show switch virtual redundancy | inc Switch|Current Software My Switch Id = 1 Peer Switch Id = 2 Switch 1 Slot 5 Processor Information : Current Software state = ACTIVE Switch 1 Slot 6 Processor Information : Current Software state = RPR-Warm Switch 2 Slot 5 Processor Information : Current Software state = STANDBY HOT (switchover target) Switch 2 Slot 6 Processor Information : Current Software state = RPR-Warm
Sup720-10GE Quad-Sup Redundancy
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Graceful VSS Quad-Sup Deployment
Software Upgrade
Deploy ICS
Redesign VSL
Upgrade VSS supervisor
(Active/Standby) to 12.2(33)SXI4
or onwards.
Maintain network availability
during software upgrade with
enhanced Fast Software
Upgrade (eFSU)
Install redundant (ICS)
supervisors on each virtual-
switch chassis.
Bootup ICS supervisor with
common software version
and license as ICA.
Build full-mesh VSL
physical paths between
quad supervisor module.
Bundle new VSL
connections in VSL EC.
Failure to follow recommended procedure may de-stabilize VSS system and network operation
Sup720-10GE Quad-Sup Redundancy
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
ICS – RPR-WARM 12.2(33)SXI5
ICS – RPR-WARM 12.2(33)SXI5
Installing ICS Supervisor With Mismatch IOS Version
Incompatible IOS software between ICA and ICS supervisor may force ICS to fallback in ROMMON
mode
ICS with Quad-Sup software capability may allow to boot up with mismatch IOS version to install
common software version
No effect of disabling IOS mismatch version if ICS boot up without Quad-Sup capability (pre-
12.2(33)SXI4)
ICS - ROMMON 12.2(33)SXI3
ICS - ROMMON 12.2(33)SXI3
SiSiSiSiSiSi SiSiSiSiSiSi
SW1 SW2
ICA – SSO Active 12.2(33)SXI4
ICA – SSO Standby 12.2(33)SXI4
6500-VSS(config)#no switch virtual in-chassis standby bootup mismatch-check
!
Sup720-10GE Quad-Sup Redundancy
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
ICS Supervisor IOS Upgrade Process
Step – 1 Disable IOS software mismatch version check from global configuration mode:
6500-VSS (config)#no switch virtual in-chassis standby bootup mismatch-check
Step – 2 Insert the ICS supervisor module in both chassis. Intra-chassis role negotiation will allow ICS to complete the bootup
process in RPR-WARM mode
Step – 3 Copy the ICA-compatible IOS software version on both ICS supervisor modules:
‒ 6500-VSS#copy <image_src_path> sw1-slot6-disk0:<image>
‒ 6500-VSS#copy <image_src_path> sw2-slot6-disk0:<image>
Step – 4 Re-enable IOS software mismatch version check from global configuration mode. Keeping disable may cause chassis to go
in RPR mode in next-switchover.
‒ 6500-VSS (config)#switch virtual in-chassis standby bootup mismatch-check
Step – 5 Force ICS supervisor module reset. In the next bootup process, the ICS module will now bootup with an ICA-compatible IOS
software version:
‒ 6500-VSS#hw-module switch 1 ics reset
‒ 6500-VSS#hw-module switch 2 ics reset
Sup720-10GE Quad-Sup Redundancy
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Dual and Quad Sup SSO Analysis
MEC based network recovery mechanic with VSS in dual
or quad-sup design.
Deterministic sub-second network convergence for unicast
and multicast data traffic.
Only SSO Active failure triggers graceful protocol recovery.
0
0.1
0.2
0.3
EIGRP - ECMP EIGRP - MEC OSPF- ECMP OSPF - MEC
Co
nve
rge
nce
(se
c)
6500-VSS Dual/Quad Sup NSF/SSO Analysis – Unicast Application
Upstream Downstream
0
20
40
60
80
100
120
140
ECMP MEC
Co
nve
rge
nce
(se
c)
6500-VSS Dual/Quad Sup NSF/SSO Analysis – Multicast Application
Active-IIL Standby-IIL
Sup720-10GE Quad-Sup Redundancy
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
VSS Dual Sup – VSL Design
Two Cisco recommended designs
VSL
Sup Sup
Profile 1 – VSL on Supervisor (Sup2T/Sup720-10GE)
Cost-effective solution to leverage both uplinks.
Continue to use non-VSL capable linecard for 10G
core connection.
Redundant fibers connects thru common fabric and
ASICs, this could result vulnerability in system
stability.
Optimal and preset VSL parameters – Load-
Balancing, QoS, HA, Traffic-engg, Dual-Active etc.
Restricted to bundle 2 x VSL ports or 20G switching
capacity on per virtual-switch node basis.
VSL
Sup Sup
Profile 2 – Diversified VSL between Supervisor (Sup2T/Sup720-10GE) and VSL capable Linecard
Redundant and diversified fibers between
supervisor and next-gen VSL capable linecards.
Same design as Profile 1 but increases system
reliability as each VSL port are diversified across
different fabric/ASICs.
Optimal and preset VSL parameters – Load-
Balancing, QoS, HA, Traffic-engg, Dual-Active
etc.
Flexible to scale up to 8 x VSL for high-dense
system to aggregate uplink, service modules,
single-home etc.
Sup2T and Sup720-10GE Design
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
SiSiSiSiSiSi
VSS Quad Sup – VSL Design
Sup-3 Sup-4 VSL
SiSiSiSiSiSi
SW1 SW2
Sup-1 Sup-2
Sup-3 Sup-4
Same Design Profile – 1 Dual Sup
Flexible to increase VSL Capacity
Continue to leverage existing non-VSL 10G
linecard for uplink connection
Retains all original VSL benefits
Vulnerable design during any supervisor self-
recovery fault incident
Recommended Full-Mesh VSL on Quad-Sup
SiSiSiSiSiSi
Sup-3 Sup-4
VSL
SiSiSiSiSiSi
SW1 SW2
Sup-1 Sup-2
Sup-3 Sup-4
Highly Redundant and cost-effective VSL
Design.
Increases overall VSL Capacity
Maintains 20G VSL Capacity during
supervisor failure.
Increases network reliability by minimizing the
dual-active probability
Sup720-10GE Quad-Sup VSL Redundancy
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
VSS Dual-Active Detection Redundancy
All VSL link failure forces both virtual-switch to transition in ACTIVE
role known as – Dual-Active
Dual-Active condition confuses neighbor devices and de-stabilizes
network.
Two Detection and Recovery Mechanic :
Direct = Dual-Active Fast Hello or BFD
In-Direct = Enhanced PAgP (ePAgP)
Recommended to use ePAgP and Fast-Hello mechanic for redundancy
BFD detection mechanic deprecated starting 15.0(SY1)
SiSiSiSiSiSi
SiSi
ePAgP Layer 2 Port-Channel
Catalyst 2K/3K/4K
SiSi
ePAgP Layer 3 Port-Channel
Fast-Hello
!Enable Enhanced PAgP on trusted L2/L3 Port-Channel interface
6500-VSS(config-vs-domain)#dual-active detection pagp trust channel-group 101
!
!Enable dual-active fast-hello on directly connected interface (copper/fiber)
6500-VSS(config#interface range Gi1/1/1 , Gi2/1/1
6500-VSS(config-if)#dual-active fast-hello
SiSiSiSiSiSi
Dual-Sup or Quad-Sup VSL Redundancy
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Dual-Active Recovery Analysis
Dual-Active Network Recovery depends on –
‒ Uplink Network Design – ECMP vs MEC
‒ Routing Protocols – EIGRP vs OSPF
‒ Detection Mechanic – Fast-Hello vs ePAgP
OSPF ECMP faster in failure detection then ePAgP. Slow
network convergence
Starting 12.2(33)SXI3 Dual-Active Fast-Hello performs rapid
failure detection and delivers deterministic recovery
independent of network design and protocol
0
5
10
15
20
25
30
35
EIGRP - ECMP EIGRP - MEC OSPF - ECMP OSPF - MEC
Co
nv
erg
en
ce
(sec)
6500E VSS – Dual-Active Recovery Analysis – ePAgP
Upstream Downstream
0
0.1
0.2
0.3
0.4
0.5
EIGRP - ECMP EIGRP - MEC OSPF - ECMP OSPF - MEC
Co
nv
erg
en
ce (
sec)
6500E VSS – Dual-Active Recovery Analysis – Fast-Hello
Upstream Downstream
Dual-Sup or Quad-Sup VSL Redundancy
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
High Availability Campus Design Agenda
Network Level Resiliency
‒ High Availability Design Principles
‒ Simplified and Redundant Campus Design
‒ Campus Routing Best Practices
System Level Resiliency
‒ Integrated Hardware and Software Resiliency
‒ Stateful Switchover and Non-Stop Forwarding
‒ Hitless Switching
Operational Level Resiliency
‒ Single and Multi-Chassis ISSU Upgrade
‒ Hitless NX-OS Software Upgrade
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Nexus 7000 Distributed Architecture
Nexus 7018
URIB MRIB FIB
URIB MRIB FIB
ACTIVE
STANDBY
Distributed IPFIB/MFIB
SSO Synchronization
Fabric Modules
1 Crossbar Fabric ASICs
2 Crossbar Fabric ASICs
5 Crossbar Fabric ASICs
46Gbps/slot
46Gbps/slot
46Gbps/slot
46Gbps/slot
46Gbps/slot
4 Crossbar Fabric ASICs
3 Crossbar Fabric ASICs
Local Switching
URIB : Unicast Routing Info Base MRIB : Multicast Routing Info Base
FIB : Forwarding Info Base MFIB : Multicast Forwarding Info Base
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
1+1 Supervisor Redundancy architecture
Decouple centralized control-plane with
distributed forwarding plane
Redundant central arbiter
Hitless Supervisor Switchover with –
‒ Distributed I/O Module
‒ Crossbar Fabric Module
Standby
Active CPU CMP CA Standby
CPU CMP CA Active
NSF Recovery
Hitless Supervisor Redundancy with Nexus 7000
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Fabric Module Capacity and Redundancy
46Gbps 92Gbps 138Gbps 184Gbps 230Gbps per slot bandwidth
Nexus 7018
1 x 23G channel per supervisor slot
2 x 23G channels per I/O module slot
Fabric Modules
1 Crossbar Fabric ASICs
2 Crossbar Fabric ASICs
5 Crossbar Fabric ASICs
46Gbps/slot
46Gbps/slot
46Gbps/slot
46Gbps/slot
46Gbps/slot
4 Crossbar Fabric ASICs
3 Crossbar Fabric ASICs
Required for 80G/slot
Insufficient Capacity
N+1 Redundancy
N+1 Redundancy
AND Future Proof
8x10GE I/O Module
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
46G/Slot 80G/Slot
80G/Slot
Nexus 7000 Crossbar Failure May Cause Fabric
Congestion
%XBAR-2-XBAR_INSUFFICIENT_XBAR_BANDWIDTH: Module in slot 1 has insufficient xbar-bandwidth.
80G/Slot
46G/Slot 80G/Slot
Asymmetric Forwarding
Capacity
Symmetric Forwarding
Capacity
Asymmetric Forwarding
Capacity
Symmetric Forwarding
Capacity
No Topology Change
No Topology Change
Crossbar Fabric module reduces internal switching capacity. And may cause congestion
Supervisor and I/O Module remains operational
No network topology change gets triggered.
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Hitless Fabric Switching with Nexus 7000
3 3
4 4
4 4
5
1
Right and Left Ejector - Open 1
2
2
Signal Software to start graceful data re-routing 2
Hitless data re-routing 3
Fabric Interface Shutdown 4
Crossbar Fabric Module Power Down 5
Hitless Fabric Switchover
Hitless Fabric Switchover
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
High Availability Campus Design Agenda
Network Level Resiliency
‒ High Availability Design Principles
‒ Simplified and Redundant Campus Design
‒ Campus Routing Best Practices
System Level Resiliency
‒ Integrated Hardware and Software Resiliency
‒ Stateful Switchover and Non-Stop Forwarding
‒ Hitless Switching
Operational Level Resiliency
‒ Single and Multi-Chassis ISSU Upgrade
‒ Hitless NX-OS Software Upgrade
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
In Service Software Upgrade Allows Upgrade Without
Taking Switch Down
In redundant topology standard maintenance practice is
to shut down devices during upgrade
and let the network converge
ISSU provides the ability to upgrade software in place
without having to shut down
Offers significant uptime improvements
ISSU—All Paths
and Switches Active
During Upgrade
Scheduled
Maintenance—
Half Capacity
SiSi
SiSiSiSi
SiSi
SiSi
SiSi
SiSiSiSi
SiSi
SiSi
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
ISSU – Graceful IOS Software Upgrade Cycle
ACTIVE
OLD
STANDBY
OLD
issu loadversion Standby Sup reboots with new software version
ACTIVE
OLD
STANDBY
NEW
issu runversion
SSO switchover and new software becomes effective
STANDBY
OLD
ACTIVE
NEW
issu acceptversion
Acknowledge successful new software activation (Optional)
STANDBY
OLD
ACTIVE
NEW
STANDBY
NEW
ACTIVE
NEW
issu commitversion
Commit and reboot the STANDBY with new software
issu abortversion
Return to original version
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
ISSU Software Upgrade Prep List
Save system configuration and save in local and remote server (TFTP/FTP)
Copy new software (same version/license) in local storage of Active and Standby
Supervisor and change boot parameters with new software version
NSF capability is enabled under routing process
Prevent following major system changes until software upgrade process
completes –
Add or remove hardware modules
Modifying software configuration
Modifying Boot-registers
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Simplified Catalyst 4500E ISSU Upgrade Process
Supported on all Supervisor Modules
Attentive four-step manual software
upgrade process
Opportunity to verify and upgrade new
software
LV
RV
AV CV
New SW
Manual Upgrade Automatic Upgrade
Supported Supervisor Modules –
Sup7E – Starting 3.1.0SG
Sup6E/Sup6L-E – Starting 15.0.2SG
Single-CLI and automated software upgrade process
Opportunity to schedule upgrade new software
ChV
RV
CV
New SW
Recommendation : Use both methods for safe and graceful software roll-out in large deployment
issu changeversion loadversion
runversion
acceptversion commitversion commitversion
runversion
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Catalyst 4500E Network Recovery with ISSU
Protects Network Capacity during entire software
upgrade process
Real-time software upgrade with NSF/SSO capability
Completes entire software upgrade process with
<50msec loss in Multilayer design
0
0.01
0.02
issu loadversion issu runversion issu commitversion
Co
nve
rgen
ce
(s
ec
)
4500E Network Recovery With ISSU Software Upgrade – Multilayer Design
Upstream Downstream Multicast
0
0.5
1
1.5
2
2.5
issu loadversion issu runversion issu commitversion
Co
nve
rgen
ce
(s
ec
)
4500E Network Recovery with ISSU Software Upgrade – Routed Access Design
Upstream Downstream Multicast
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
4
Standby Active
VSS Inter-Chassis Software Upgrade Process
VSL
SiSiSiSiSiSi
VSL
SW1 SW2
SiSiSiSiSiSi SiSiSiSiSiSi
Active Standby
SW1
SiSiSiSiSiSi
SW2
SiSiSiSiSiSi
ISSU LoadVersion –
Triggers Standby chassis to reset with new software version.
1
ISSU RunVersion –
Forces SSO Switchover and makes new software version operational.
New Active starts graceful protocol recovery. Active switch starts ISSU
roll-back timer after Standby becomes operational
2
ISSU AcceptVersion –
Stops Roll-back Timer
3
ISSU CommitVersion –
Triggers Standby chassis to reset with new software version. 4
1
2 3
Starting 12.2(33)SXI 6500 VSS supports enhanced Fast Software Upgrade (eFSU)
Dual-Sup eFSU Upgrade Process
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
VSS Quad Sup Software Upgrade Process
Standby Active
SiSiSiSiSiSi
SW1 SW2
Active Standby
SW1 SW2
ISSU LoadVersion – Triggers ICA and ICS Supervisor modules in Standby chassis to reset with new software version.
1
ISSU RunVersion – Forces SSO Switchover and makes new software version operational. New Active starts graceful protocol recovery. Active switch starts ISSU roll-back timer after Standby becomes operational
2
ISSU AcceptVersion – Stops Roll-back Timer
3
ISSU CommitVersion – Triggers ICA and ICS Supervisor modules in Standby chassis to reset with new software version.
4
SiSiSiSiSiSi SiSiSiSiSiSi
VSL
SiSiSiSiSiSi SiSiSiSiSiSi
VSL
Sup720-10GE Quad-Sup eFSU Upgrade Process
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Network capacity is reduced until Standby chassis
becomes operational
Network availability is maintained with MEC
MEC based recovery mechanic allows complete
software upgrade process ~1-second traffic loss
Catalyst 6500E VSS Network Recovery with eFSU
0
0.05
0.1
0.15
0.2
0.25
issu loadversion issu runversion issu commitversion
Co
nve
rge
nce
(se
c)
6500E – VSS Dual/Quad Sup Network Recovery with eFSU Software Upgrade
Upstream Downstream Multicast
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
High Availability Campus Design Agenda
Network Level Resiliency
‒ High Availability Design Principles
‒ Simplified and Redundant Campus Design
‒ Campus Routing Best Practices
System Level Resiliency
‒ Integrated Hardware and Software Resiliency
‒ Stateful Switchover and Non-Stop Forwarding
‒ Hitless Switching
Operational Level Resiliency
‒ Single and Multi-Chassis ISSU Upgrade
‒ Hitless NX-OS Software Upgrade
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Nexus 7000 NX-OS ISSU Benefits
Simplified – Single-CLI to upgrade (system/kickstart) several
distributed hardware components
Automated – Fully automates the upgrade process in serial order.
Reliable – Runs new software compatibility test on current
hardware inventory, generates impact report prior initializing
upgrade.
Hitless – Graceful and non-disruptive procedure, leverages
distributed forwarding architecture to upgrade entire system with
zero packet loss.
Hitless ISSU Upgrade
Hitless ISSU Upgrade
System Kickstart CMP CMP-BIOS
System Kickstart CMP CMP-BIOS
I/O BIOS
I/O BIOS
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Nexus 7000 NX-OS ISSU Upgrade Prep List
Save system configuration and save in local and remote server (TFTP/FTP)
Copy new software in local storage of Active and Standby Supervisor
Run new software compatibility test and generate detail upgrade analysis report
‒ show install all impact system bootflash:/<system-image-name> kickstart bootflash:/<kickstart-image-name>
Prevent following major system changes until software upgrade process completes –
- Add or remove hardware modules
- Modifying software configuration
- Modifying Boot-registers
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Standby
Compatibility check is done: Module bootable Impact Install-type Reason ---------- ----------- ------------------- --------------- ---------- 1 yes non-disruptive rolling 2 yes non-disruptive rolling 5 yes non-disruptive reset 6 yes non-disruptive reset Module Image Running-Version(pri:alt) New-Version Upg-Required ---------- ---------- ------------------------------ ----------------- ------------------- 1 lc1n7k 5.0(5) 5.1(1a) yes 1 bios v1.10.14(04/02/10):v1.10.14(04/02/10) v1.10.14(04/02/10) no 2 lc1n7k 5.0(5) 5.1(1a) yes 2 bios v1.10.14(04/02/10):v1.10.14(04/02/10) v1.10.14(04/02/10) no 5 system 5.0(5) 5.1(1a) yes 5 kickstart 5.0(5) 5.1(1a) yes 5 bios v3.22.0(02/20/10):v3.22.0(02/20/10) v3.22.0(02/20/10 no 5 cmp 5.0(2) 5.1(1) yes 5 cmp-bios 02.01.05 02.01.05 no 6 system 5.0(5) 5.1(1a) yes 6 kickstart 5.0(5) 5.1(1a) yes 6 bios v3.22.0(02/20/10):v3.22.0(02/20/10) v3.22.0(02/20/10 no 6 cmp 5.0(2) 5.1(1) yes 6 cmp-bios 02.01.05 02.01.05 no
Do you want to continue with the installation (y/n)? [n] Y
Nexus 7000 Hitless NX-OS Upgrade Process
Hitless ISSU Upgrade
Hitless ISSU Upgrade
Active
Active
install all … Starts compatibility test and generates impact report. Upon user action proceed or terminate ISSU upgrade process
1
1 N7K#install all system bootflash:///<system-image-name> kickstart bootflash:///<kickstart-image-name> !
1 3
Updates boot variable and resets Standby supervisor to reboot with new NX-OS software
2
Active supervisor resets and performs hitless SSO switchover. Reboots with new NX-OS software. This step makes new NX-OS in-effect
3
Starts non-disruptive I/O Module upgrade in serial order. Roll-over CPU with new NX-OS software in-effect. Remains operational during upgrade
4
Upgrades CMP Processor and BIOS on Active and Standby Supervisor
5
2 System Kickstart CMP CMP-BIOS System Kickstart CMP CMP-BIOS
System Kickstart CMP CMP-BIOS
Standby
System Kickstart CMP CMP-BIOS
I/O BIOS I/O BIOS
I/O BIOS I/O BIOS
4
4
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Summary
Simplify and Optimize your campus network design with system and network
consolidation to maintain application performance even during common network faults
Leverage hardware-based fault detection for scale-independent and deterministic
network recovery
Build non-stop communication network with system-level redundancy in all campus
layer – Access / Distribution / Core
Design mission-critical campus backbone that offers scale flexibility, key foundational
services and uncompromised high-availability.
Reduce maintenance window and upgrade system while maintaining network
availability
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Recommended Reading
Continue your Cisco Live learning experience with further reading
from Cisco Press
Check the Recommended Reading flyer for suggested books
End-to-End QoS Network Design: Quality of Service in LANs, WANs and VPNs
ISBN: 1-58705-176-1
Building Resilient IP Networks
ISBN: 1-58705-215-6
Top-Down Network Design, Second Ed.
ISBN: 1-58705-152-4
Available Onsite at the Cisco Company Store
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Complete Your Online Session Evaluation
Give us your feedback and you could win
fabulous prizes. Winners announced daily.
Receive 20 Passport points for each session
evaluation you complete.
Complete your session evaluation online now
(open a browser through our wireless network
to access our portal) or visit one of the Internet
stations throughout the Convention Center.
Don’t forget to activate your
Cisco Live Virtual account for access to
all session material, communities, and
on-demand and live activities throughout
the year. Activate your account at the
Cisco booth in the World of Solutions or visit
www.ciscolive.com.
98
© 2012 Cisco and/or its affiliates. All rights reserved. BRKCRS-3032 Cisco Public
Final Thoughts
Learn more in the World of Solutions. Visit Booth #XXXX
Visit www.ciscoLive365.com after the event for updated PDFs, on-
demand session videos, networking, and more!
Follow Cisco Live! using social media:
‒ Facebook: https://www.facebook.com/ciscoliveus
‒ Twitter: https://twitter.com/#!/CiscoLive
‒ LinkedIn Group: http://linkd.in/CiscoLI
99
© 2012 Cisco and/or its affiliates. All rights reserved. Presentation_ID Cisco Public