BRKCRS-3146
Advanced VPC Operation and Troubleshooting
Follow us on Twitter for real time updates of the event:
@ciscoliveeurope, #CLEUR
Dmitry Goloubev Technical Leader, Tech services
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 2
Housekeeping
We value your feedback- don't forget to complete your online session evaluations after each session & the Overall Conference Evaluation which will be available online from Thursday
Visit the World of Solutions and Meet the Engineer
Visit the Cisco Store to purchase your recommended readings
Please switch off your mobile phones
After the event don’t forget to visit Cisco Live Virtual: www.ciscolivevirtual.com
Follow us on Twitter for real time updates of the event: @ciscoliveeurope, #CLEUR
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 4
Goals
Understand general concepts of Virtual Port Channel feature on Nexus 7000
Review the impact of VPC on bridging and routing
Learn how to troubleshoot VPC
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 5
No blocked ports, More usable bandwidth, Load-sharing
Distribution switch or link failure does not mean reconvergence
…enables to build PortChannel to 2 separate switches
virtualizing network building block
to this from this …or, logically
VPC at the network level
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 6
2 active control planes
2 configs
2 points of management
2 active data planes
Primary-Secondary notion for some
aspects of operation
Control messages and Data frames
flow between active and standby via
Peer-Link
Peer-Link is 802.1Q trunk
Control messages are carried by CFS
over Peer Link
Active Data Plane
Active Control Plane
Active Data Plane
Active Control Plane
VPC
Peer-Link
Peer
Keepalive link
Primary Secondary
VPC domain
VPC components at a glance
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 7
Agenda
Initialization & Redundancy considerations
Spanning Tree
Traffic forwarding
1st hop redundancy
Multicast considerations
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 8
Stages of VPC initialization
16:34:06 %VPC-5-VPCM_ENABLED: vPC Manager enabled
16:34:07 %VPC-5-PEER_KEEP_ALIVE_STATUS: In domain 2, peer keep-alive status changed to enabled …
16:34:17 %ETHPORT-5-IF_UP: Interface port-channel2 is up in Layer3 Peer-Keepalive …
16:34:19 %VPC-3-VPC_PEER_LINK_BRINGUP_FAILED: vPC peer-link bringup failed (vPC peer is not reachable over cfs) …
16:34:19 %ETHPORT-5-IF_UP: Interface port-channel1 is up in mode trunk Peer-Link
16:34:23 %VPC-4-VPC_ROLE_CHANGE: In domain 2, VPC role status has changed to primary …
16:34:23 %VPC-5-VPC_DELAY_SVI_BUP_TIMER_START: vPC restore, delay interface-vlan bringup timer started
16:34:33 %VPC-5-VPC_DELAY_SVI_BUP_TIMER_EXPIRED: vPC restore, delay interface-vlan bringup timer expired, reiniting interface-vlans
16:34:33 %INTERFACE_VLAN-5-UPDOWN: Line Protocol on Interface vlan 4, changed state to up
16:34:33 %VPC-5-VPC_RESTORE_TIMER_START: vPC restore timer started to reinit vPCs
16:34:41 %VPC-3-VPC_BRINGUP_FAILED: vPC 102 bringup failed (Peer-link state is not UP)
16:35:03 %VPC-5-VPC_RESTORE_TIMER_EXPIRED: vPC restore timer expired, reiniting vPCs
16:35:13 %VPC-5-VPC_UP: vPC 102 is up
16:35:13 %ETHPORT-5-IF_UP: Interface port-channel102 is up in mode trunk
1. VPC manager starts
2. Peer-keepalive comes up (receives keepalives from the peer)
3. Peer-link comes up (data is not passing through yet, just CFS)
4. Primary/Secondary Role resolved
5. Global Consistency check
6. Peer-link is up for data
7. SVIs brought up (VPC + 10 sec)
8. VPCs brought up (SVI + 30 sec)
Timers are adjustable in VPC
domain configuration context
SVI ‘delay restore interface-vlan’
VPC ‘delay restore’
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 9
Certain configuration mistakes could lead to loops or blackholing [when STP config is inconsistent] Others might cause undesirable forwarding implications to specific interfaces [Inconsistent ACL, SVIs]
Consistency checking prevents the prevents network-wide issues (type1) and warns about possible forwarding oddities (type2)
Nexus# sh vpc consistency-parameters interface port-channel 1 Name Type Local Value Peer Value ------------- ---- ---------------------- ----------------------- lag-id 1 [(7f9b, [(7f9b, ... mode 1 active active STP Port Type 1 Default Default STP Port Guard 1 None None STP MST Simulate PVST 1 Default Default Native Vlan 1 1 1 Port Mode 1 trunk trunk MTU 1 1500 1500 Duplex 1 full full Speed 1 10 Gb/s 10 Gb/s Allowed VLANs - 101 101
Nexus# sh vpc consistency-parameters global Name Type Local Value Peer Value ------------- ---- ---------------------- ----------------------- STP Mode 1 Rapid-PVST Rapid-PVST STP Disabled 1 None None STP MST Region Name 1 "" "" STP MST Region Revision 1 0 0 STP MST Region Instance to 1 VLAN Mapping STP Loopguard 1 Disabled Disabled STP Bridge Assurance 1 Enabled Enabled STP Port Type, Edge 1 Normal, Disabled, Normal, Disabled, BPDUFilter, Edge BPDUGuard Disabled Disabled STP MST Simulate PVST 1 Enabled Enabled Interface-vlan admin up 2 101 101 Interface-vlan routing 2 1,101 1,101
VPC consistency checking
Inconsistency Type Action Example of inconsistency
Type 1 / Global Vlans suspended on peer-link, VPCs up with
respective vlans suspended
Rapid-PVST STP on one peer, MST
STP on another
Type 1 / Interface Vlans suspended on respective VPC MTU mismatch, STP guard config
mismatch
Type 2 Syslog message SVI is up on one peer, down on another
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 10
Graceful Consistency check
VPC Type 1 inconsistency suspends all vlans on corresponding VPC on both peers
This triggers forwarding interruption during config changes (for example while changing MTU on VPC)
As of 4.2(8) and 5.2(1) VPC supports Graceful Consistency Check
Graceful consistency check brings down interfaces on secondary peer upon inconsistency, primary peer keeps forwarding traffic
Enabled by default
Nexus(config-vpc-domain)# graceful consistency-check
Nexus# show vpc brief vPC domain id : 1 Peer status : peer adjacency formed ok vPC keep-alive status : peer is alive vPC role : secondary ... Graceful Consistency Check : Enabled vPC status ---------------------------------------------------------------------------- id Port Status Consistency Reason Active vlans -- ---- ------ ----------- ------ ------------ 1 Po1 down* failed vPC type-1 2-10
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 11
VPC behavior at initialization
Peer-Keepalives must be heard before we bring up the Peer-Link
VPC control plane must be able to communicate to the peer over peer-link
Negotiate LACP/STP operating roles for the chassis
Wait for per-port peer parameters and handshake to bring up vPC ports
Performs peer parameters consistency check on each VPC bringup
Will not bring up VPCs if only one of two VPC peers comes up (for example after power outage)
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 12
VPC Reload Restore
Allows to bring up VPCs after timeout if peer is presumed dead
Default timeout 360 sec
Assumes primary role for STP and LACP
Nexus(config)# vpc domain 1 Nexus(config-vpc-domain)# reload restore ? <CR> delay Duration to wait before assuming peer dead and restoring vpcs Nexus(config-vpc-domain)# reload restore delay ? <240-3600> Time-out for restoring vPC links (in seconds)
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 13
VPC auto-recovery (replaces Reload-Restore as of NXOS 5.2.1)
Auto-recovery addresses cases of multiple failures. For example
Peer-link fails and after a while primary switch (or keepalive link) fails
Both VPC peers are reloaded and only one comes back up
How it works
If Peer-link is down on secondary switch, 3 consecutive missing peer-keepalives will trigger auto-recovery
After reload (role is ‘none established’) auto-recovery timer (240 sec) expires while peer-link and peer-keepalive still down, autorecovery kicks in
Switch assumes primary role
VPCs are brought up bypassing consistency checks
Nexus(config)# vpc domain 1 Nexus(config-vpc-domain)# auto-recovery Nexus# sh vpc | i recovery Auto-recovery status : Enabled (timeout = 240 seconds)
Failure type Reload restore Auto recovery
After reload only single peer comes up √ √
Peer-link fails, then eventually complete
primary switch fails - √
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 14
Troubleshooting VPC: initialization Always start with sh vpc – it gives ~90% of all information needed for initial situation assessment
vpc1# sh vpc Legend: (*) - local vPC is down, forwarding via vPC peer-link vPC domain id : 1
Peer status : peer adjacency formed ok
vPC keep-alive status : peer is alive
Configuration consistency status: success
Type-2 consistency reason : Consistency Check Not Performed
vPC role : primary
Number of vPCs configured : 1
Peer Gateway : Disabled
Dual-active excluded VLANs : - vPC Peer-link status --------------------------------------------------------------------- id Port Status Active vlans -- ---- ------ -------------------------------------------------- 1 Po100 up 1,101 vPC status ---------------------------------------------------------------------- id Port Status Consistency Reason Active vlans -- ---- ------ ----------- ------ ------------ 1 Po1 up success success 101
CFS can communicate with the
peer
We hear peer-alives
Configs are compatible
Master/Slave for certain apps
Peer-Link is up with expected vlans
Vlans are active on VPCs
Peer status issue check if peer-link is up, check if remote end is also configured as peer-link, then look at CFS. Note peer-link will fully come up when 1) peer-keepalive is up and 2) peers can talk via CFS over peer-link
Peer-keepalive issue check ‘sh vpc keepalive’, check outgoing interface being up, in correct vrf, check the route to destination (in correct vrf), ping the remote and check the same on the remote peer
Role issue check ‘sh vpc role’ on both sides, note that peer that’s been up/active the longest will remain operational-active even if other peer will have better priority. This is done to minimize traffic disruption. If role is ‘none established’ it means the VPC came up after reload/new config and VPCs will not come up before role is resolved or reload-restore/auto-recovery kicks in
Consistency issues check ‘sh vpc consistency global|interface …’
Vlans not up check if respective vlan allowed on peer-link, check syslog for other causes ‘sh log log | inc VLANS’
Always keep track of situation on both peers
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 15
Process restartability
Supervisor redundancy
VPC redundancy
Active
Standby(SSO)
Active
Standby(SSO)
Process 1
Process 2
Process X
…
Process 1
Process 2
Process X
…
Switch 1 Switch 2
VPC Domain
Processes checkpoint their runtime state
Crashing process is restarted statefully by
NXOS system manager
HA-policy will trigger
supervisor switchover
in response to
excessive process
crashing, software,
hardware or
diagnostic failure
VPC redundancy model
Devices dual-attached to VPC domain are protected against
single switch failure (power, hardware, maintenance etc)
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 16
VPC Keepalive link
Heartbeat between vPC peers to prevent dual-active scenario
Keepalives are sent every second by default on UDP port 3200
3 second hold timeout on peer-link loss how long we ignore keepalives after peer-link loss
5 seconds keepalive timeout (starts after hold timeout after peer-link down) how long we wait for failure after hold timeout
Use dedicated link, although NXOS does not enforce this – just IP connectivity is verified
Management interface can be used as keepalive link, but do not connect the interfaces together directly (only active supervisor management interface is up)
Peer Keepalive
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 17
Handling Peer-link failure flow
Peer-link failure
Am I primary?
Done
Keepalive timeout
expired?
Primary is alive
Bring down all VPC ports
primary
2ndary
no no
Ignore keepalives
for hold-timeout (3 sec)
Start keepalive timeout timer
(default 5 sec)
Received Keepalive?
Primary is gone
Become primary
yes yes
Note: If primary fails completely
once the VPCs are down on
secondary, VPCs will stay down
until primary recovers
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 18
Handling Peer-link failure flow with Auto-recovery
I am primary?
Done
Missed 3
Keepalives in a
row?
Primary is alive
Bring down all VPC ports
primary
2ndary
no
no
Received Keepalive
Primary is gone
Become primary
Bypass consistency checks
Bring up VPCs
yes
yes
Note: Unlike in the previous case
the keepalive status is always
checked, not only for
keepalivehold + keepalivetimeout
seconds after peer-link failure Peer-link
Down?
no
yes
NEW
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 19
If Peer-link and Keepalive both fail …while primary peer is still alive
Dual-active situation
There will be 2 primary switches sending independent BPDUs
VPC Port-channels on upstream/downstream switches will be error-disabled by ‘EtherChannel Misconfiguration Guard’ after ~90seconds http://www.cisco.com/en/US/tech/tk389/tk213/technologies_tech_note09186a008009448d.shtml
If Nexus 7000/5000 is on the other end of VPC no errordisable as NXOS does not support EtherChannel Guard
Depending on remote configuration (presence of VPC, peer-switch etc) there can be different outcomes ranging from no impact to STP dispute, to STP state cycling between dispute, blocking and forwarding. Split vlan
Provision redundancy for keepalive link, make sure it doesn’t share datapath with peer-link
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 20
What to do if only 1 peer is operational… … and VPCs are down … due to power issue, hardware failure on the 2nd peer etc
VPC(s) will be down if they had to flap or current peer was reloaded (because consistency check couldn’t be performed without 2nd peer)
Non-issue with auto-recovery, but what if current NXOS version < 5.2 ?
Possible actions
Recover 2nd peer
…or remove VPC config from port-channel(s) vpc(config-if)# no vpc 123
… or in case of many VPCs, remove VPC config vpc# sh run vpc > bootflash:myvpc.conf vpc(config)# no feature vpc
vpc# sh vpc ...
Peer status : peer link is down vPC keep-alive status : Suspended (Destination IP not reachable) Configuration consistency status : failed Configuration inconsistency reason: Consistency Check Not Performed vPC role : none established ...
vPC status ---------------------------------------------------------------------- id Port Status Consistency Reason Active vlans -- ---- ------ ----------- ------ ------------ 102 Po102 down Not Consistency Check Not - Applicable Performed
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 21
Troubleshooting VPC peer-keepalives
Nexus# show vpc peer-keepalive
vPC keep-alive status : peer is alive
--Send status : Success
--Last send at : 2009.06.19 00:41:15 589 ms
--Sent on interface : Eth2/35
--Receive status : Success
--Last receive at : 2009.06.19 00:41:14 580 ms
--Received on interface : Eth2/35
--Last update from peer : (1) seconds, (9) msec
vPC Keep-alive parameters
--Destination : 7.7.7.77
--Keepalive interval : 1000 msec
--Keepalive timeout : 5 seconds
--Keepalive hold timeout : 3 seconds
--Keepalive vrf : v1
--Keepalive udp port : 3200
--Keepalive tos : 192
Nexus# show vpc statistics peer-keepalive
vPC keep-alive status : peer is alive
vPC keep-alive statistics
----------------------------------------------------
peer-keepalive tx count: 9773
peer-keepalive rx count: 8985
average interval for peer rx: 991
Count of peer state changes: 0
Peer-keepalive is only essential at
the time when peer-link goes down
or comes up
At any other time peer-keepalive
failure will only trigger syslog
Peer-keepalives might be affected
by extreme control plane load
(check CPU utilization & COPP)
Number of keepalive state
transitions, closer to 0 - better
Only reception of keepalive packets at IP level is required
Generic routing/switching connectivity troubleshooting might be needed if packets are lost (make sure there is a route/arp in the correct VRF…)
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 22
Nexus# sh cfs application ---------------------------------------------- Application Enabled Scope ---------------------------------------------- arp Yes Physical-eth stp Yes Physical-eth vpc Yes Physical-eth igmp Yes Physical-eth l2fm Yes Physical-eth ...
Cisco Fabric Services CFS
Transport mechanism for control-plane messaging between VPC peers
Uses
• Consistency validation
• MAC address synchronization
• vPC member port status signalling
• IGMP snooping synchronization
• vPC status signalling
VPC CFS messages are encapsulated in Ethernet frames and delivered between to peer via the peer-link
CFS messaging
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 23
VPC: CFS troubleshooting
Cisco Fabric Services Transport of control messages between VPC peers
Nexus# show cfs status
Distribution : Enabled
Distribution over IP : Disabled
IPv4 multicast address : 239.255.70.83
IPv6 multicast address : ff15::efff:4653
Distribution over Ethernet : Enabled
Nexus# show cfs peers
Physical Fabric
---------------------------------------------
Switch WWN IP Address
---------------------------------------------
20:00:00:1b:54:c2:42:41 10.48.73.222 [Local]
Nexus
20:00:00:1b:54:c2:42:44 0.0.0.0
Total number of entries = 2
Nexus# show cfs internal ethernet-peer statistics | i Trans|Rece
Number of Segments Transmitted : 218
Number of Acks Transmitted : 223
Maximum Segment Size Transmitted : 0
Number of Transmission Timeouts : 0
Number of segments in Transmit Queue : 0
Number of segments in Re-Transmit Queue : 0
Total Number of Segments Received : 441
Number of Acks Received : 217
Number of Duplicate Messages Received : 0
Number of Unexpected Segments Received : 0
Number of fragmented segments Received : 2
Number of duplicate fragments Received : 0
Number of unfragmented segments Received : 210
Number of Received Segments Dropped : 0
Number of Unreliable segments Transmitted : 1
Number of Unreliable segments Received : 1
Nexus# sh cfs internal notification log name vpc
Sun Nov 14 15:27:22 2010: Peer add 20:00:00:1b:54:c2:42:44
Sun Nov 14 19:05:25 2010: Peer gone 20:00:00:1b:54:c2:42:44
Sun Nov 14 19:08:03 2010: Peer add 20:00:00:1b:54:c2:42:44
TX/RX counters should move when
VPC is active or coming up
Remote peer should be seen
Shows timestamps for when CFS
communication for VPC was
interrupted (peer-reload, peer-link
issues etc)
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 24
Swapping Primary Secondary roles
Sometimes it is preferred for operational reasons to have specific switch as primary
VPCs are down for ~1 minute after primary changes to secondary
Approach
1. Change role priority
2. Bounce peer-link
vpc1(config)# vpc domain 2 vpc1(config-vpc-domain)# role priority 60 Warning: !!:: vPCs will be flapped on current primary vPC switch while attempting role change ::!! Note: --------:: Change will take effect after user has re-initd the vPC peer-link ::-------- vpc1(config-vpc-domain)# int po1 vpc1(config-if)# shut .... vpc1(config-if)# no shut ... 21:28:34 %VPC-5-ROLE_PRIORITY_CFGD: In domain 2, vPC role priority changed to 60 21:28:34 %VPC-5-SYSTEM_PRIO_CFGD: In domain 2, vPC system priority changed to 32667 21:28:36 %ETHPORT-5-IF_DOWN_NONE: Interface port-channel102 is down (None) 21:28:36 %VPC-4-VPC_ROLE_CHANGE: In domain 2, VPC role status has changed to secondary 21:35:40 %VPC-5-VPC_PEER_LINK_UP: vPC Peer-link is up
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 25
VPC operational considerations … from troubleshooting perspective
VPC troubleshooting is often part of investigation of larger scale event – connectivity issues following power-outage, upgrade, migration, major changes etc
Datacenter connectivity being impacted usually implies lots of pressure (time and otherwise…)
Always know the current situation before trying to ‘recover’…
Trying to fix a non-issue one risks to make things worse… At minimum collect the state of the system before trying anything drastic
When traffic forwarding is concerned basic information on interfaces, VPC states, STP states, MAC addresses, L3 routes/ARPs is essential – takes a minute to collect, just paste this into shell on both peers term len 0 sh int sh vpc sh port-channel summary sh spanning-tree sh mac address-table sh routing vrf all sh ip arp vrf all
sh tech detail – is preferred (though takes ~10 minutes to collect, depending on CPU load and number of linecards) note: if VDCs are used best practice is to collect ‘sh tech detail’ from both main VDC and VDC in question. ‘sh tech brief’ is faster alternative
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 26
VPC config considerations
VPC Domain # must be unique for each Layer2-adjacent VPC domain – otherwise issues with multicast forwarding, LACP negotiation of cross-VPC links may arise
Set logging level for vpc to 5 – makes VPC operation easier to follow
Use LACP for the peer-link (channel-group <x> mode active) – more resilient to separate link failures (fiber/sfp going bad) or switch control-plane failures
Use auto-recovery (if available, use reload-restore if not) – useful for cases of multiple failures, more graceful recovery
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 29
Agenda
Initialization & Redundancy considerations
Spanning Tree
Traffic forwarding
1st hop redundancy
Multicast considerations
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 30
Spanning Tree in VPC domain
STP process
Primary Secondary
STP process
STP runs on both switches (2 active control
planes) but only primary switch drives STP of
VPCs. Port state changes are communicated to
secondary via CFS messages.
For non-VPC ports domain appears as 2 bridges
1
Peer-link is part of STP. BPDU handling is
modified such that Peer-link will not be blocked
(similar to MST implementation of IST)
2
Non-VPC ports are managed independently by
local STP process on each switch
1 1
2
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 31
STP behavior upon VPC primary failure
Primary Secondary OP-Primary
ROOT ROOT Backup
ROOT
Depending on control plane load it might take few
seconds for Op-primary to start sending BPDUs.
This might cause STP reconvergence on
connected switches hence increasing hello time
or peer-switch feature might be considered in
large deployments
Primary switch (STP root) fails 1
Secondary switch becomes operational primary
and STP root
2
STP root port doesn’t change nor any STP port
states for VPCs, forwarding continues
1
2
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 32
STP behavior upon VPC primary recovery
Secondary OP-Primary
ROOT ROOT
OP-Secondary
SYNC Backup
ROOT
Left switch comes back up 1
Peer-Link comes back up 2
VPC role is resolved as Operational-secondary 3
Left switch has better STP priority becomes
STP root
4
STP root port of right switch will change and that
will trigger SYNC: all non-edge STP ports will be
temporarily blocked
5
Once sync is complete ports will resume
forwarding
1
2 3
4 5
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 33
VPC Peer-Switch feature
Primary Secondary
Both VPC switches originate BPDUs with preconfigured information. This allows to keep the same BPDU when primary fails/recovers no extra SYNC required short interruption in forwarding described on previous slide is avoided
Both left and right switches consider themselves root
Both left and right switches send BPDUs all the time no need to raise hello time & STP Bridge Assurance can be enabled on VPCs
spanning-tree vlan 1-1000 priority 8192 vpc domain 1 peer-switch
spanning-tree vlan 1-1000 priority 8192 vpc domain 1 peer-switch
ROOT ROOT
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 34
VPC Peer-Switch feature Primary Secondary
left# sh span vlan 101 VLAN0101 Spanning tree enabled protocol rstp Root ID Priority 8293 Address 0023.04ee.be01 This bridge is the root ... Bridge ID Priority 8293 (priority 8192) Address 0023.04ee.be01 ... Interface Role Sts Cost Prio.Nbr Type ---------------- ---- --- --------- -------- --------------- Po1 Desg FWD 1 128.4096 (vPC) P2p Po100 Root FWD 2 128.4195 (vPC peer-link) left# sh vpc role | i mac vPC system-mac : 00:23:04:ee:be:01 vPC local system-mac : 00:1b:54:c2:42:43
right# sh span vlan 101 VLAN0101 Spanning tree enabled protocol rstp Root ID Priority 8293 Address 0023.04ee.be01 This bridge is the root ... Bridge ID Priority 8293 (priority 8192) Address 0023.04ee.be01 ... Interface Role Sts Cost Prio.Nbr Type ---------------- ---- --- --------- -------- --------------- Po1 Desg FWD 1 128.4096 (vPC) P2p Po100 Desg FWD 2 128.4195 (vPC peer-link)
In Peer-Switch mode bridge-ID comes from system-mac as opposed to local mac in normal mode
ROOT ROOT
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 35
STP inconsistencies
When STP detects certain abnormal situations it will mark ports as inconsistent and block them to prevent forwarding loops
- Root – Root Guard feature detected inconsistency (unwanted bridge tries to become root)
- Loop – Loop Guard feature detected inconsistency (port becomes designated because no BPDUs are being received)
- Bridge Assurance (BA) (no BPDUs are received from remote side)
- VPC Peer-link (any of above inconsistencies happened on VPC peer-link)
%STP-2-VPC_PEER_LINK_INCONSIST_BLOCK: vPC peer-link detected BPDU receive timeout blocking port-channel11 VLAN0121.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 36
Handling Peer-Link STP inconsistencies on Primary switch
Primary Secondary When peer-link STP inconsistency is detected on
primary switch the link will be put in ‘inconsistent’
STP state (effectively blocking state)
1
BPDUs are not sent on peer-link when it is
inconsistent. This is to allow secondary switch to
detect inconsistency and react
1
inco
nsi
sten
cy
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 37
Handling Peer-Link STP inconsistencies on Secondary switch
Primary Secondary
When peer-link STP inconsistency is detected on
secondary switch the peer link will be put in
‘inconsistent’ STP state (effectively blocking
state)
1
Respective vlans or MST instances are also
blocked on all VPCs
2 2
2
1 inco
nsi
sten
cy
inco
nsi
sten
cy
This behavior depends on STP Bridge Assurance on peer-link (default) as a way to signal to the secondary peer about inconsistency
With BA disabled on Peer-link any inconsistency on the Primary will lead to Peer-link flap
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 38
STP troubleshooting: PES/SPS & BPDU redirection Primary VPC peer controls the port states on the secondary peer by
means of SPS (set-port-state) messages
Changes in STP information are syncronized between peers using PES (port-event-sync) messages
nexus# sh spanning-tree internal info vpc | exc 0$
...
======= CFSoe Statistics =========================
Total PES Msgs sent : 4
Total SPS Msgs sent : 4
Total MCS Msgs sent : 8
Total PES Response Msgs received : 4
Total SPS Response Msgs received : 4
Total Response Msgs received : 8
nexus# sh system internal frame traffic | i BPDU
Ingress BPDUs qualified for redirection 42
Ingress BPDUs redirected to peer 42
Egress BPDUs qualified for redirection 0
Egress BPDUs dropped due to remote down 0
Egress BPDUs redirected to peer 0
BPDUs are sent to VPCs out of primary switch. If VPC leg connected to primary is down, BPDUs are sent over peer-link and sent out by secondary
Constantly incrementing SPS/PES
counters might indicate STP
instability or constant
reconvergence.
Use ‘sh spanning detail’ and
‘debug spanning-tree events’ to
find a reason for reconvergences
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 39
STP troubleshooting
Peer link is running STP
vpc1# sh spanning-tree vlan 4
VLAN0004 Spanning tree enabled protocol rstp Root ID Priority 32772 Address 0018.ba88.4a00 Cost 2 Port 4096 (port-channel1) Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Bridge ID Priority 32772 (priority 32768 sys-id-ext 4) Address 68bd.abd7.51c2 Hello Time 2 sec Max Age 20 sec Forward Delay 15 sec
Interface Role Sts Cost Prio.Nbr Type ---------------- ---- --- --------- -------- -------------------------------- Po1 Root FWD 1 128.4096 (vPC peer-link) Network P2p Po102 Root FWD 1 128.4197 (vPC) P2p
vpc1# sh spanning-tree vlan 4 detail | i "^ Port|BPDU"
Port 4096 (port-channel1, vPC Peer-link) of VLAN0004 is root forwarding
BPDU: sent 46416, received 46418
Port 4197 (port-channel102, vPC) of VLAN0004 is root forwarding
BPDU: sent 0, received 0
On the other end of peer-link po1 is designated
It is possible to see situation when
there are 2 root ports: peer-link
and some VPC
This happens when STP root is
behind VPC and BPDU is received
by the peer - this does not indicate
any issue
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 40
STP troubleshooting Looking at BPDUs live
vpc1# debug spanning-tree bpdu_tx tree 101
14:20:37.556707 stp: RSTP(101): transmitting RSTP BPDU on port-channel100
14:20:37.556750 stp: vb_vlan_shim_send_bpdu(1933): VDC 4 Vlan 101 port port-channel100 enc_type 1 len 42
14:20:37.556834 stp: RSTP(101): transmitting RSTP BPDU on port-channel1
14:20:37.556863 stp: vb_vlan_shim_send_bpdu(1933): VDC 4 Vlan 101 port port-channel1 enc_type 2 len 36
vpc1# debug spanning-tree all
14:22:23.560147 stp: RSTP(1): transmitting RSTP BPDU on port-channel100
14:22:23.560169 stp: vb_vlan_shim_send_bpdu(1933): VDC 4 Vlan 1 port port-channel100 enc_type 2 len 36
14:22:23.560219 stp: BPDU TX: vb 1 vlan 1 port port-channel100 len 36 ->0180c2000000 CFG P:0000 V:02 T:02 F:78 R:80:01:00:1b:54:c2:42:43 00000002 B:80:01:00:1b:54:c2:42:44 9063 A:0000 M:0014 H:0002 F:000f
nexus# sh spanning-tree internal event-history tree 0 interface port-channel 50 VDC02 MST0000 <port-channel50> 0) Transition at 497772 usecs after Tue Oct 20 17:42:01 2009 State: FWD Role: Root Age: 5 Inc: no [STP_PORT_STATE_CHANGE] 1) Transition at 661395 usecs after Tue Oct 20 17:42:01 2009 State: FWD Role: Root Age: 4 Inc: no [STP_PORT_ROLE_CHANGE] 2) Transition at 17741 usecs after Tue Oct 20 17:42:03 2009 State: BLK Role: Root Age: 5 Inc: no [STP_PORT_STATE_CHANGE] ...
Alternatively use ‘ethanalyzer’ to capture and dump BPDUs. Beware the BPDUs received by other peer and redirected to primary will not be seen in expected way because of extra encapsulation
Looking at past events…
This output can be easily limited to
necessary Vlan/Interface, but it
doen’t dump the BPDU
Very chatty – use ‘debug logfile
<file>’ to redirect output to a file
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 41
Layer2 stability features recap
Feature Condition Works on Effect Note
UDLD
Detects if link becomes
unidirectional
I.e. link cannot carry BPDUs
both ways causes loops
Physical
port
Error-disables
unidirectional
links
Useful on port-channels to
take out broken links,
alternative fast-timers
PAGP/LACP
Bridge
Assurance
(BA)
Expects to receive a BPDU
every hello_time from the
peer.
I.e. cases of dead control
plane on the remote side,
also BPDU loss
Logical
port
Blocks port at
STP level
(BA-
inconsistent
state)
Main protection mechanism
where supported, alternative
is Loop Guard
Dispute
Checks the remote port role
in the received BPDU, role
should not be designated in
BPDU received on
designated port
Cases of unidirectional
communication
Logical
port
Blocks port at
STP level
(Disputed
state)
Complements BA, on by
default. Somewhat overlaps
with UDLD, but not as
effective on port-channels.
Only works with RSTP/MST
BPDUs
Loop
Guard
Doesn’t allow port to take
designated role if it stopped
receiving BPDUs
Unidirectional
communication, control plane
issues on remote
Logical
port
Blocks port at
STP level
(Loop-
inconsistent)
Superseded by BA + Dispute,
use with PVST+ or when BA
is not supported
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 42
Bridge assurance, Dispute & UDLD
BA is default enabled on Peer-Link, not recommended for VPCs unless Peer-Switch feature is also operational
Dispute is default enabled (for both RSTP and MST on VPC)
UDLD [normal mode] is recommended to take out bad links from channels (otherwise LACP takes ~100sec vs ~20 with UDLD)
Recommendation
Preferred BA + UDLD + Dispute (on all interswitch links when using Peer-switch) when all switches support this (nexus 7000/5000 and cat6500/VSS do support)
Without Peer-switch BA should be kept only on Peer-Link (no BA or LoopGuard on VPCs) use UDLD + Dispute
If preferred config is not supported use Loop Guard + UDLD (supported by all Cisco switches)
Can potentially mix and match supported features per-switch, but do understand which cases in which combinations each feature covers
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 44
Agenda
Initialization & Redundancy considerations
Spanning Tree
Traffic forwarding
1st hop redundancy
Multicast considerations
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 45
Special case for forwarding
x
x
x
PC A ends a packet to PC B 1
MAC B is not known by left switch flood 2
MAC B is not known by right switch flood 3
B receives duplicate frames 4
MAC A will be learned on wrong port on the lower
access switch blackholing traffic to A
5
Frames received on Peer-Link
must not be flooded out of VPCs
PC A
PC B
A ←
1
2 3
4
5 A ↑ x
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 46
Special case for forwarding: VPC way
MAC B is not known by left switch flood 1
Frames received from Peer-Link are never sent
out of VPC (except those without operational
ports on ingress switch)
Egress port ASICs will drop the frame
Frame is still flooded to devices that are solely
connected to egress switch 3
This rule (called ‘VPC check’) stands for all traffic
(L2, L3, unicast, multicast, broadcast, flooded etc)
on Nexus 7000 (Nexus 3000/5000 VPC have
similar rule, but different implementation)
1
3
2
2
2
PC A
PC B
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 47
Summary: VPC traffic forwarding with Nexus 7000
√ √ X √
x
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 48
Topologies where VPC forwarding rules will have implications
vlan 2
SVI 1 up
SVI 2 up
Packets arriving to
the left switch, with
destination MAC of
right switch will be
dropped
With peer-gateway
enabled adjacencies
may not come up
This issue is not
specific to OSPF –
same for any routing
protocol
Use routed links to
connect routers
Configuration and
operational state of
SVI interfaces for
vlans present on
VPCs should be
consistent
Otherwise packet
arriving to left switch
for destination on
VPC in vlan 2 will
have to cross Peer-
Link and will be
dropped by right
switch
Add routed cross-link
between peers
x SVI 1 up
SVI 2 down
Frames received from Peer-Link are never sent out of
VPC (except those without operational ports on ingress
switch)
OSPF
routed routed
x
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 49
Verifying whether frame will be sent to peer-link
Nexus# show mac address-table vlan 35
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+------+----------------
+ 35 0007.b400.0101 dynamic 0 False False Po1
G 35 0007.b400.0102 static - False False sup-eth1(R)
G 35 001b.54c2.4241 static - False False sup-eth1(R)
* 35 001b.54c2.4244 static - False False vPC Peer-Link
+ 35 0012.da65.9ec0 dynamic 0 False False Po1
If frame arrives to this switch in vlan 35 destined to 001b.54c2.4244 it will be sent to peer-link
If this MAC address belongs to one of L3 SVI interfaces of peer-switch and IP destination of the frame is behind the VPC and this VPC has active links on this (local) switch then frame will be dropped by peer-switch
Verify where the destination MAC address of the frame points to
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 50
MAC address learning
A ↓ A x
A ↓
MAC A is learned on lower VPC 1
PC A
PC B MAC A is learned on Peer-Link 2
Frame destined to A arriving to right switch will be
sent to Peer-Link
3
Traffic should prefer local links when available
(traffic locality rule)
1
2
3
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 51
MAC address learning: VPC way
A ↓ A ↓
MAC A is learned on lower VPC 1
PC A
PC B
MAC addresses are never learned from traffic on
Peer-Link
Frame destined to A arriving to right switch will be
sent to lower VPC 3
1
2
3
Left switch sends a CFS message to right switch
telling about MAC A learned on lower VPC. Right
switch updates MAC address table
2
CFS message
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 53
Troubleshooting Layer 2
20.1.2.3 91.0.0.10
0013.1908.e246
Po50
Vlan 50
Po22
Vlan 20
nexus# sh mac address-table address 0013.1908.e246 vlan 50 VLAN MAC Address Type age Secure NTFY Ports ---------+-----------------+--------+---------+------+----+------------------ * 50 0013.1908.e246 dynamic 0 F F Po50 nexus# sh spanning-tree vlan 50 interface port-channel 50 Mst Instance Role Sts Cost Prio.Nbr Type ---------------- ---- --- --------- -------- -------------------------------- MST0002 Desg FWD 200 128.4145 (vPC) P2p nexus# sh hardware mac address-table 2 address 0013.1908.e246 vlan 50 Valid| PI | BD | MAC | Index | Stat| SW | Modi| Age | Tmr | | | | | | ic | | fied| Byte| Sel | -----+----+-------+---------------+--------+-----+----+-----+-----+-----+ 1 1 161 0013.1908.e246 0x00a36 0 3 0 141 1 nexus# sh system internal pixm info ltl 0x00a36 | i Eth.*, 0x0a36 Eth2/36, nexus# sh mac address-table address 0021.55e0.66c2 vlan 20 VLAN MAC Address Type age Secure NTFY Ports ---------+-----------------+--------+---------+------+----+------------------ * 20 0021.55e0.66c2 dynamic 660 F F Po22 nexus# sh spanning-tree vlan 20 interface port-channel 22 Mst Instance Role Sts Cost Prio.Nbr Type ---------------- ---- --- --------- -------- -------------------------------- MST0000 Desg FWD 200 128.4117 (vPC) Network P2p nexus# sh hardware mac address-table 1 address 0021.55e0.66c2 vlan 20 Valid| PI | BD | MAC | Index | Stat| SW | Modi| Age | Tmr | | | | | | ic | | fied| Byte| Sel | -----+----+-------+---------------+--------+-----+----+-----+-----+-----+ 1 1 18 0021.55e0.66c2 0x00a32 0 2 0 103 1 nexus# sh system internal pixm info ltl 0x00a32 | i Eth.*, 0x0a32 Eth1/13, Eth1/14,
MAC addresses should point
to expected ports in expected
vlans (path towards source)
The ports should be in STP
forwarding mode
Hardware MAC address
table should be consistent
with software table
Finding port# for given index
Linecard Slot number
VPC
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 56
Troubleshooting Layer 3
nexus# sh routing ip 20.1.2.3 ... 20.1.2.3/32, ubest/mbest: 1/0 *via 20.1.1.240, Vlan20, [1/0], 03:48:59, static nexus# sh ip arp 20.1.1.240 Address Age MAC Address Interface 20.1.1.240 00:02:17 0021.55e0.66c2 Vlan20 nexus# sh forwarding ip route 20.1.2.3 module 2 ... ------------------+------------------+--------------------- Prefix | Next-hop | Interface ------------------+------------------+--------------------- 20.1.2.3/32 20.1.1.240 Vlan20 nexus# sh forwarding adjacency 20.1.1.240 module 2 IPv4 adjacency information next-hop rewrite info interface -------------- --------------- ------------- 20.1.1.240 0021.55e0.66c2 Vlan20 nexus# sh int vl 20 | i address Hardware is EtherSVI, address is 0023.ac66.1a42 nexus# sh mac address-table address 0023.ac66.1a42 vlan 20 VLAN MAC Address Type age Secure NTFY Ports ---------+-----------------+--------+---------+------+----+------------------ G 20 0023.ac66.1a42 static - F F sup-eth1(R)
Is there route to
destination
Is the next hop resolved
Looking at module 2
because this is where
packets in question
should be received
Is adjacency consistent
with ARP
Router MAC must have
Gateway flag in order for
packet to be L3 switched
20.1.2.3 91.0.0.10
0013.1908.e246
Po50
Vlan 50
Po22
Vlan 20
VPC
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 57
Where given packet will be load-balanced
nexus# sh routing hash 91.0.0.10 20.1.2.3 Load-share parameters used for software forwarding: load-share mode: address source-destination port source-destination Universal-id seed: 0xcdb5769f Hash for VRF "default" Hashing to path *20.1.1.3 (hash: 0x2a), for route: 20.1.2.3/32, ubest/mbest: 2/0 *via 20.1.1.3, Vlan20, [1/0], 00:01:37, static *via 20.1.1.240, Vlan20, [1/0], 16:32:42, static
For port-channels
nexus# sh port-channel load-balance forwarding-path interface port-channel 22 dst-ip 20.1.2.3 src-ip 91.0.0.10 vlan 20 module 2
Missing params will be substituted by 0's.
Module 2: Load-balance Algorithm: source-dest-ip-vlan
RBH: 0 Outgoing port id: Ethernet1/14
Load-balancing is configurable
under ‘ip load-sharing address’ in
default VDC and affects all VDCs
Load-balancing is configurable
under ‘port-channel load-balance’
in default VDC and affects all VDCs
Use ‘sh port-channel rbh-distribution’ to see which link sends traffic for which of 8 available load-balancing ‘buckets’
For equal-cost routes
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 60
nexus# sh hardware internal errors all ---------------------------------------- Hardware errors as reported in module 1 ---------------------------------------- |------------------------------------------------------------------------| | Device:R2D2 Role:MAC | |------------------------------------------------------------------------| Instance:7 ID Name Value Ports -- ---- ----- ----- 28688 aric_no_port_select_error 0000000000000002 1,3,5,7 I2 ... |------------------------------------------------------------------------| | Device:Ashburton Role:MAC Mod: 1 | |------------------------------------------------------------------------| Instance:0 3629 Egress Port-1 VSL Dropped Packet Count 0000000853635833 5 - 3630 Egress Port-2 VSL Dropped Packet Count 0000000857893046 3 - ... |------------------------------------------------------------------------| | Device:Naxos Role:MAC SECURITY | |------------------------------------------------------------------------| Instance:0 ID Name Value Ports -- ---- ----- ----- 106 m1_fab_p25_txq_tc0_drop_count 00000000000012af 2 - ... |------------------------------------------------------------------------| | Device:Metropolis Role:REWR | |------------------------------------------------------------------------| Instance:1 ID Name Value Ports -- ---- ----- ----- 70 Krypton input controller zero portsel cnt 0000000000000038 18,20,22,24,26,28,30,32 |------------------------------------------------------------------------| | Device:Lamira Role:L3 | |------------------------------------------------------------------------| Instance:0 ID Name Value Ports -- ---- ----- ----- 93 CL2 Invalid Pkt count 00000008759cb9cb 1-32 I1 ...
#1 command to look for hardware
packet drops
Not every drop listed here is actual
data packet drop
Run several times to see if any
counters increase at rate similar to
traffic loss
To clear counters, use
‘clear statistics module-all device all’
Datapath Drops
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 63
Agenda
Initialization & Redundancy considerations
Spanning Tree
Traffic forwarding
1st hop redundancy
Multicast considerations
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 64
1st hop redundancy with VPC
MAC_A vMAC
IP A IP B
Router MAC1
0001.0002.0003
Virtual MAC
0000.0c07.ac00
Router MAC2
0005.0006.0007
Virtual MAC
0000.0c07.ac00
MAC_B vMAC
IP B IP A
PC A
PC B
HSRP
Each of VPC peers will L3 forward packets
destined to its respective Router MAC address
HSRP/VRRP/GLBP used for 1st hop redundancy
Both switches will L3 switch packets to vMAC
address as long as one of them is HSRP active or
HSRP standby.
If both switches are HSRP listening, they will not
L3 switch packets to vMAC
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 65
Nexus# sh hsrp brief Interface Grp Prio P State Active addr Standby addr Group addr Vlan1 1 100 Standby 1.1.1.253 local 1.1.1.254 Nexus# sh mac address-table address 0000.0c07.ac01 VLAN MAC Address Type age Secure NTFY Ports ---------+-----------------+--------+-----+------+------+----------- G 1 0000.0c07.ac01 static - False False sup-eth1(R) Nexus2# sh hsrp brief Interface Grp Prio P State Active addr Standby addr Group addr Vlan1 1 100 Active local 1.1.1.252 1.1.1.254 Nexus2# sh mac address-table address 0000.0c07.ac01 VLAN MAC Address Type age Secure NTFY Ports ---------+-----------------+--------+-----+------+------+----------- G 1 0000.0c07.ac01 static - False False sup-eth1(R)
First hop redundancy troubleshooting
HSRP
Interface Vlan1 ip address 1.1.1.252/24 hsrp 1 ip 1.1.1.254
Interface Vlan1 ip address 1.1.1.253/24 hsrp 1 ip 1.1.1.254
Both peers will L3 forward packets destined to vMac address as long as either peer in VPC domain is in ‘active’ or ‘standby’ state for corresponding group
Virtual mac address (vMac) will be installed in both peers
‘G’ (gateway) flag must be present on any MAC address for which the nexus is expected to L3 forward packets
Only active will respond to ARP for VIP
standby active
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 66
1st hop issue with some devices
MAC_A vMAC
IP A IP B
Router MAC1
0001.0002.0003
Virtual MAC
0000.0c07.ac00
Router MAC2
0005.0006.0007
Virtual MAC
0000.0c07.ac00
PC A
Server B
Router MAC1 MAC_B
IP A IP B
MAC_B Router MAC1
IP B IP A
MAC_B Router MAC1
IP B IP A
X
Left VPC switch will receive the packet and
forward it to Server B, note Source MAC of
outgoing packet will be that of Router1
2
PC A sends a packet to Server B 1
Server B responding to PC A will populate
destination MAC from source MAC of received
frame (this is wrong, it should use ARP)
3
If frame from BA will be load-balanced to right
switch the MAC address of Router1 will point to
Peer-Link and this is where the frame will be sent
4
Left switch will receive the frame from Peer-Link
and drop it
5
Why? Frames received from Peer-Link are never
sent out of VPC except those without operational
ports on ingress switch - egress port ASICs will
drop the frame (VPC check)
1
2
3
4
5
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 67
Peer-Gateway : the workaround
PC A
Server B
MAC_B Router MAC1
IP B IP A
MAC_B Router MAC1
IP B IP A
With peer-gateway both peers will install router
MACs of each other in L2 table which will allow
them to L3 forward traffic destined to either
Router MAC
Server B responding to PC A will populate
destination MAC from source MAC of received
frame (this is wrong, it should use ARP)
1
Right switch will forward packet towards
destination
2
1
2
Router MAC1
0001.0002.0003
Virtual MAC
0000.0c07.ac00
Router MAC2
0005.0006.0007
Virtual MAC
0000.0c07.ac00
Router MAC1
0001.0002.0003
Router MAC2
0005.0006.0007
Virtual MAC
0000.0c07.ac00
Router MAC2
0005.0006.0007
Router MAC1
0001.0002.0003
Virtual MAC
0000.0c07.ac00
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 68
Peer-Gateway : the implications
Router MAC1
0001.0002.0003
Router MAC2
0005.0006.0007
Virtual MAC
0000.0c07.ac00
Router MAC2
0005.0006.0007
Router MAC1
0001.0002.0003
Virtual MAC
0000.0c07.ac00
X
MAC_B Router MAC1
IP TOP IP LEFT, TTL 1
Top device attempts to establish OSPF adjacency
with the left switch
1
If peer-gateway is enabled in VPC domain and
OSPF unicast packet will be load-balanced to the
right switch, this packet will be dropped
2
Why? Right switch will try to L3-switch the
unicast packet (because RouterMAC1 is marked
as gateway MAC and destination IP is not local)
As packet has TTL==1 it will be dropped
Same applies to any other protocol that uses
unicast packets with TTL==1 entering right switch
but destined to left switch (or vise versa)
Routing protocol peering with devices attached to
VPC domain via SVI interface is not supported
Routed interface should be used in this case
1
2
There is ‘peer-gateway exclude-vlan’ command to turn off peer-gateway on certain vlans
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 71
VPC Agenda
Initialization & Redundancy considerations
Spanning Tree
Traffic forwarding
1st hop redundancy
Multicast considerations
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 72
Once (S1,G) traffic starts arriving, VPC peers will
resolve which one will be forwarder for that (S,G):
peer with best metric to source or primary in a tie
(this mechanism is specific to PIM in VPC mode,
normally PIM would use assert)
IP Multicast with VPC
Receiver
Source S1
Receiver sends IGMP report (join)
DR (left peer) sends PIM Join to RP
Only forwarder will have OIFs populated in (S,G)
the non-forwarder won’t have VPC SVIs in OIF list
RP
Primary 2ndary
CFS:IGMP
IGMP join
IGMP is encapsulated in CFS and sent to left peer
(*,G)VPC (*,G)VPC
(S1,G)VPC (S1,G)null
Access switch sends join to right VPC peer
Right VPC peer creates (*,G) adds VPC to OIF (as
proxy-DR)
Left peer (DR) creates (*,G) adding VPC to OIF
DR
Forwarder will send a copy of frame to the peer-
link for receivers single-connected to other peer
Proxy-DR
Goal is to allow the peer receiving source traffic to forward it to receivers behind VPC without crossing peer-link (VPC check will drop such traffic otherwise)
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 73
IP Multicast with VPC: Prebuilt-SPT
Source S1
With ‘ip pim pre-build-spt’ proxy-DR will also send
a PIM Join to source/RP to draw the traffic
RP
Primary 2ndary
(*,G)VPC (*,G)VPC
(S1,G)VPC (S1,G)VPC
In case of DR failure proxy-DR becomes DR and
posts OIF-list from (*,G) to (S,G), but it will also
need to pull traffic from RP/source which delays
recovery
DR Traffic pulled by proxy-DR will be dropped until it
becomes DR – provision uplink and replication
bandwidth accordingly
Receiver
(S1,G)null
New DR
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 74
IP Multicast with VPC: source behind VPC
Source S1
RP
Primary 2ndary
(*,G)VPC2 (*,G)VPC2
(S1,G)VPC2 (S1,G)VPC2
When Source is behind VPC both DR and Proxy-
DR will add OIFs for the group to (S,G)
This is because either peer can receive source
traffic and need to be able to send it to receivers
behind VPCs without crossing peer-link (to avoid
dropping the traffic by VPC check)
Receiver
VPC1 VPC2
When VPC is configured on N7K-F248XP-25 linecard (F2) there is no proxy-DR function (due to hardware specifics). Packet will be bridged to DR over peer-link (VPC check is modified accordingly for L3 multicast packets on F2 linecards)
DR Proxy-DR
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 75
For sources behind VPC both peers will forward as they have no control on which one will get the traffic…
VPC1# sh ip pim internal vpc rpf Source: 10.0.1.1 Pref/Metric: 110/21 Source role: primary Forwarding state: Win (forwarding)
VPC1# sh ip pim internal vpc rpf Source: 1.1.1.1 Pref/Metric: 0/0 Source role: primary Forwarding state: Win-force (forwarding)
Peers do ‘metrics exchange’ over CFS for each new source
Peer that has better metric to source or primary will be forwarder
Forwarder election in VPC
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 77
Are packets being switched by this entry?
Nexus# show ip mroute 239.1.2.3 (*, 239.1.2.3/32), uptime: 06:46:05, igmp pim ip static Incoming interface: Vlan36, RPF nbr: 36.0.0.3 Outgoing interface list: (count: 2) Ethernet2/43, uptime: 03:01:36, static Vlan37, uptime: 06:46:05, igmp (33.0.0.33/32, 239.1.2.3/32), uptime: 06:46:05, ip pim mrib Incoming interface: Vlan36, RPF nbr: 36.0.0.3 Outgoing interface list: (count: 2) Ethernet2/43, uptime: 03:01:36, mrib Vlan37, uptime: 06:46:04, mrib
control plane state for this group
where information came from
stable?
RPF interface
Nexus# show ip mroute 239.1.2.3 summary software-forwarded Total number of routes: 3 Total number of (*,G) routes: 1 Total number of (S,G) routes: 1 Total number of (*,G-prefix) routes: 1 Group count: 1, rough average sources per group: 1.0 Group: 239.1.2.3/32, Source count: 1 Source packets bytes aps pps bit-rate oifs (*,G) 0 0 0 0 0.000 bps 2 sw-pkts: 0 33.0.0.33 5046908 252345396 49 200 80.053 kbps 2 sw-pkts: 1
Is traffic being switched for this group?
counters updated once ~1 minute
packets forwarded in software
average packet size
VPC multicast: following packet flow
Nexus# show ip igmp snooping groups vlan 37 Type: S - Static, D - Dynamic, R - Router port Vlan Group Address Ver Type Port list 37 */* - R Vlan37 37 239.1.2.3 v2 D Eth2/8
where are receivers on this vlan?
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 78
Following the flow: forwarding information Nexus# show forwarding multicast route group 239.1.2.3 slot 1 ======= (*, 239.1.2.3/32), RPF Interface: Vlan36, flags: G Received Packets: 0 Bytes: 0 Number of Outgoing Interfaces: 2 Outgoing Interface List Index: 4 Vlan37 Outgoing Packets:0 Bytes:0 Ethernet2/43 Outgoing Packets:N/A Bytes:N/A (33.0.0.33/32, 239.1.2.3/32), RPF Interface: Vlan36, flags: Received Packets: 5723369 Bytes: 366295616 Number of Outgoing Interfaces: 2 Outgoing Interface List Index: 4 Vlan37 Outgoing Packets:0 Bytes:0 Ethernet2/43 Outgoing Packets:N/A Bytes:N/A slot 2 ======= (*, 239.1.2.3/32), RPF Interface: Vlan36, flags: G Received Packets: 0 Bytes: 0 Number of Outgoing Interfaces: 2 Outgoing Interface List Index: 4 Vlan37 Outgoing Packets:5725816 Bytes:366452224 Ethernet2/43 Outgoing Packets:3032294 Bytes:194066816 (33.0.0.33/32, 239.1.2.3/32), RPF Interface: Vlan36, flags: Received Packets: 0 Bytes: 0 Number of Outgoing Interfaces: 2 Outgoing Interface List Index: 4 Vlan37 Outgoing Packets:5725816 Bytes:366452224 Ethernet2/43 Outgoing Packets:3032294 Bytes:194066816
This is platform independent forwarding
information
Ingress linecard entry
Egress linecard entry
Counters are updated once per ~1minute
Counters between ingress/egress do not have to
match, as information is collected not at the same
exact time, receiver might join after the entry was
created etc
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 79
When traffic arrives via VPC
How to find which slot receives the S,G flow when ingress interface is port-channel scattered across several modules?
show forwarding multicast route group <g> source <s>
Nexus# show forwarding multicast route group 239.1.1.1 source 1.0.1.2 | i Received|slot slot 1 Received Packets: 0 Bytes: 0 slot 2 Received Packets: 727203 Bytes: 487290999
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 81
Are there drops in forwarding path?
Start looking from Ingress module
Nexus# show hardware internal errors module 1 ---------------------------------------- Hardware errors as reported in module 1 ---------------------------------------- ... |------------------------------------------------------------------------| | Device:Lamira Role:L3 Mod: 1 | | Last cleared @ Thu Apr 8 12:57:37 2010 | Device Statistics Category :: ERROR |------------------------------------------------------------------------| Instance:0 ID Name Value Ports -- ---- ----- ----- 259 L3 Fib Miss Pkt ctr 0000000000000007 1-32 I1 262 L3 Non-Rpf Drop Pkt ctr 0000000000125617 1-32 I1 319 NF2 V4 IPMAC Lkup Error 0000000000272277 1-32 I1 455 Exception cause: DROP (Unicast) 0000000000025510 1-32 I1 465 Exception cause: DROP (Multicast) 0000000000226148 1-32 I1
Always take several snapshots and look for drops that grow coherently with [suspected] multicast traffic drops
There are always some drops shown by above command – this doesn’t always mean the actual network packets are dropped. Some of these are diag packets, some are packets that are dropped on blocked ports, extra floods etc
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 82
Review & Summary
Infrastructure
Redundancy at process, supervisor, port-channel, chassis, VPC level
Both peers are needed to bring up VPCs auto-recovery/reload-restore can change this
Peer-Keepalive + Role defines behavior during VPC failovers
Forwarding
Traffic locality (VPC check) + No learning on Peer-Link
No blocking ports (generally), but common L2 stability mechanisms still important (LACP active, UDLD, BA, Dispute)
Interfacing with L3 requires separate links + cross link
Troubleshooting
Layered, always take basic info, narrow down to a layer/issue type before trying to recover
Data plane – troubleshoot each peer like normal switch paying attention to nuances like VPC check, dual-DR and Router-MACs
Recommended Reading
Please visit the Cisco Store for suitable reading.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 84
Please complete your Session Survey
Don't forget to complete your online session evaluations after each session.
Complete 4 session evaluations & the Overall Conference Evaluation
(available from Thursday) to receive your Cisco Live T-shirt
Surveys can be found on the Attendee Website at www.ciscolivelondon.com/onsite
which can also be accessed through the screens at the Communication Stations
Or use the Cisco Live Mobile App to complete the
surveys from your phone, download the app at
www.ciscolivelondon.com/connect/mobile/app.html
We value your feedback
http://m.cisco.com/mat/cleu12/
1. Scan the QR code
(Go to http://tinyurl.com/qrmelist for QR code reader
software, alternatively type in the access URL above)
2. Download the app or access the mobile site
3. Log in to complete and submit the evaluations
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 85
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Public BRKCRS-3146 86
Thank you.
Top Related