NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may...

35
Hammad Alam, VMware, Inc Shahzad Ali, VMware, Inc NET1777 #VMworld #NET1777 Troubleshooting Methodology for VMware NSX for vSphere VMworld 2017 Content: Not for publication or distribution

Transcript of NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may...

Page 1: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Hammad Alam, VMware, IncShahzad Ali, VMware, Inc

NET1777

#VMworld #NET1777

Troubleshooting Methodology for VMware NSX for vSphere

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 2: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

• This presentation may contain product features that are currently under development.

• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

• Technical feasibility and market demand will affect final delivery.

• Pricing and packaging for any new technologies or features discussed or presented have not been determined.

Disclaimer

#NET1777 CONFIDENTIAL 2

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 3: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

#NET1777 CONFIDENTIAL 3

Customer NSX Expert

Session Objective: An Interactive Conversation

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 4: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

NSX Operations Landscape

#NET1777 CONFIDENTIAL 44

NSX Native Options

NSX Dashboard Endpoint MonitoringNSX Central CLI Protocols

IPFIX

SNMP

Syslog

Port Mirroring

NSX Manager GUI NSX RESTful API Traceflow

Partner Ecosystem

vRealize Log Insight vRealize Network Insight

VMware – Eco System

OpenSource

PowerNSX

PyNSXv

PowerOpsVMworld 2017 Content: Not fo

r publication or distri

bution

Page 5: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Troubleshooting NSX

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 6: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Troubleshooting Methodology

#NET1777 CONFIDENTIAL 7

• NSX Understanding

• Implementation Knowledge

• vCenter NSX Web Plug-in

• NSX Native Tools

• vRealize Log Insight / Network Insight

• Central CLI

• Packet Capture

• Support for Automation

Documentation

+ Environment

Knowledge

Level-1/

Level-2

Level-3/

Level-4VMworld 2017 Content: Not fo

r publication or distri

bution

Page 7: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Understanding NSX Components

#NET1777 CONFIDENTIAL 8

Lo

gic

al N

etw

ork

Ph

ys

ica

l

Ne

two

rk

Management

Plane

▪ Single configuration portal

▪ REST API entry-point

NSX Manager

Control

Plane

▪ Manages Logical networks

▪ Control-Plane Protocol

▪ Separation of Control and Data Plane

Controller Cluster

Data

Plane

NSX Edge VM

ESXi Hypervisor Kernel Modules

Distributed Services

▪ High – Performance Data Plane

▪ Scale-out Distributed Forwarding

ModelDFWDLRLogical

Switch

VPN

Reference

DLR Control VM

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 8: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Host1 Host2Host1 Host1 Host1 Host2

DLR

ESG ECMP

Software

L2 Bridge DLR

ESG HA Mode

Software

L2 Bridge

Implementation Knowledge

#NET1777 CONFIDENTIAL 9

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 9: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

UI Based TroubleshootingLevel 1/Level 2

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 10: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

NSX UI

#NET1777 CONFIDENTIAL 11

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 11: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

#NET1777 CONFIDENTIAL 12

Flow Monitoring

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 12: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

#NET1777 CONFIDENTIAL 13

Endpoint Monitoring

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 13: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

#NET1777 CONFIDENTIAL 14

Application Rule Manager

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 14: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Traceflow

#NET1777 CONFIDENTIAL 15

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 15: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

#NET1777 CONFIDENTIAL 16

Log Insight – Dashboards

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 16: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

17

vRNI: Object Path

#NET1777 CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 17: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

#NET1777 CONFIDENTIAL 18

vRNI: Alerts for New Events

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 18: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

CLI Based TroubleshootingLevel 3/Level 4

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 19: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Dynamic Routing Route Propagation in NSX – Control Plane

#NET1777 CONFIDENTIAL 20

NSX Edges in ECMP

(Acting as next hop router)

10.145.225.2/26 10.145.225.66/26

VM-A VM-B

10.145.225.1 10.145.225.65

.1 .2 .3

Forwarding Address .5

External Network

Protocol Address .4

Logical Router

Control VM

Controller

Cluster

NSX Mgr

DLR Control VM is deployed by NSX

Manager. DLR instance information is sent

to Controller and Hosts

1

Dynamic Routing Peering is established

between the Edge(s) and DLR Control VM.

DLR’s Protocol address is used for Control

Communication

3

Routes Learnt by DLR Control VM are sent

to Controller for distribution

4

Controller sends the route updates to all

ESXi Hosts

5

1

34

5

Controller pushes DLR Configuration

including LIFs to ESXi Hosts

2

1

2

Logical

Router

ESXi

19

2.1

68.2

55.x

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 20: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

NSX Data Plane

#NET1777 CONFIDENTIAL 21

NSX Edges in ECMP

10.145.225.2/26 10.145.225.66/26

VM-A VM-B

10.145.225.1 10.145.225.65

Logical

Router

192.168.255.x .1 .2 .3

192.168.255.5

External NetworkDLR instance on each host maintains the

Forwarding Table

1

For North-South traffic, ESG’s next-hop is

DLR’s Forwarding IP Address

3

East-West routing happens between VMs

on different Logical Switches

2

ESXi

Destination GenMask Gateway Interface----------- ------- ------- ---------0.0.0.0 0.0.0.0 192.168.255.3 1b58000000020.0.0.0 0.0.0.0 192.168.255.2 1b58000000020.0.0.0 0.0.0.0 192.168.255.1 1b580000000210.145.225.0 255.255.255.192 0.0.0.0 1b580000000e10.145.225.64 255.255.255.192 0.0.0.0 1b580000000f{truncated..}

1

2

E-W

N-S

3O N2 10.145.225.0/26 via 192.168.255.5O N2 10.145.225.64/26 via 192.168.255.5{truncated …}

3

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 21: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Understanding Physical Distributed Routing Systems

#NET1777 CONFIDENTIAL 22

Fabric Card

For Control Path

• Traffic hits ingress LC (no RP involved)

• FCs route cells between

• Traffic exits via Egress LC

For Data Path

• On RP : routing daemon peers with routers to build Route Tables

• On RP : Route Tables builds Forwarding Information Base (FIB)

• FIBs downloaded to LC

• FC programmed by RP with LC reachability information

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 22: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Routing Table – NSX Edge (ESG)

#NET1777 CONFIDENTIAL 23

nsx-mgr-east> show edge allCLI commands for Edge ServiceGateway(ESG) start with 'show edge'CLI commands for Distributed Logical Router(DLR) Control VM start with 'show edge'CLI commands for Distributed Logical Router(DLR) start with 'show logical-router'

Edge ID Name Size Version Statusedge-1 esg-east-prm-1 Q 6.2.3 GREENedge-2 east-dlr-1 C 6.2.3 GREEN{truncated …}

nsx-mgr-east> show edge edge-1 ip route

Codes: O - OSPF derived, i - IS-IS derived, B - BGP derived,C - connected, S - static, L1 - IS-IS level-1, L2 - IS-IS level-2,Total number of routes: 14

S 0.0.0.0/0 [0/0] via 10.155.171.126O N2 10.145.225.0/26 [110/1] via 192.168.255.5O N2 10.145.225.64/26 [110/1] via 192.168.255.5

{truncated …}

List of all ESGs

Routing Table at a

specific ESG

ESG

DLR

CVM

Cont

Host

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 23: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Routing Table – DLR Control VM & Controllers

#NET1777 CONFIDENTIAL 24

nsx-mgr-east> show edge edge-2 ip route

Codes: O - OSPF derived, N2 - OSPF NSSA external type 2

O N2 0.0.0.0/0 [110/0] via 192.168.255.1 O N2 0.0.0.0/0 [110/0] via 192.168.255.2 O N2 0.0.0.0/0 [110/0] via 192.168.255.3 O N2 10.140.54.0/26 [110/1] via 192.168.255.1 O N2 10.140.54.0/26 [110/1] via 192.168.255.2 {truncated...}

nsx-mgr-east> show logical-router controller master dlr edge-2 route Destination Next-Hop[] Preference Source 0.0.0.0/0 192.168.255.2 110 CONTROL_VM

192.168.255.3 192.168.255.1

10.140.54.192/26 192.168.255.2 110 CONTROL_VM 192.168.255.3 192.168.255.1

{truncated...}

Routes at the

Controller to be

pushed to hosts

Routes at the DLR

Control VM

ESG

DLR

CVM

Cont

Host

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 24: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Route Table on Host DLR Instance

#NET1777 CONFIDENTIAL 25

nsx-mgr-east> show logical-router host host-454 dlr edge-2 route

VDR default+edge-2 Route TableLegend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]Legend: [H: Host], [F: Soft Flush] [!: Reject] [E: ECMP]

Destination GenMask Gateway Flags Interface----------- ------- ------- ----- ---------0.0.0.0 0.0.0.0 192.168.255.3 UGE 1b58000000020.0.0.0 0.0.0.0 192.168.255.2 UGE 1b58000000020.0.0.0 0.0.0.0 192.168.255.1 UGE 1b580000000210.145.225.0 255.255.255.192 0.0.0.0 UCI 1b580000000e10.145.225.64 255.255.255.192 0.0.0.0 UCI 1b580000000f{truncated..}

See the Routing

Table for a DLR at

a Host

ESG

DLR

CVM

Cont

Host

show cluster <cluster-id>

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 25: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Logical Router’s ARP Table – At NSX Edge

#NET1777 CONFIDENTIAL 26

nsx-mgr-east> show edge edge-1 arphaIndex: 0-----------------------------------------------------------------------vShield Edge ARP Cache:IP Address Interface MAC Address State192.168.254.5 vNic_2 02:50:56:56:44:52 REACHABLE192.168.255.2 vNic_1 00:50:56:a0:df:a8 STALE192.168.254.2 vNic_2 00:50:56:a0:10:6b STALE192.168.255.4 vNic_1 00:50:56:a0:0f:12 STALE10.155.171.125 vNic_0 cc:46:d6:64:94:85 STALE192.168.255.3 vNic_1 00:50:56:a0:e9:99 STALE192.168.254.4 vNic_2 00:50:56:a0:b8:54 STALE192.168.254.3 vNic_2 00:50:56:a0:4c:50 STALE

Look at the ARP

Table at an Edge

ESG

DLR

CVM

Cont

Host

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 26: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Logical Switch’s MAC, ARP & VTEP Table – At Host

#NET1777 CONFIDENTIAL 27

nsx-mgr-east> show logical-switch host host-454 vni 7006 vtepVTEP count: 3

Segment ID: 10.155.171.32VTEP IP: 10.155.171.36

Segment ID: 10.155.171.32VTEP IP: 10.155.171.33

Segment ID: 10.155.171.32VTEP IP: 10.155.171.35

nsx-mgr-east> show logical-switch host host-454 vni 7006 arpARP entry count: 1

IP: 10.145.225.66MAC: 00:0b:0b:0b:0b:0b

nsx-mgr-east> show logical-switch host host-454 vni 7006 macMAC entry count: 1

Inner MAC: 00:0b:0b:0b:0b:0bOuter MAC: 00:50:56:6d:ee:e1Outer IP: 10.155.171.36

Look at the MAC

Table for a VNI at

a Host

Look at the ARP

Table for a VNI at

a Host

Look at all the

VTEPs joined to a

VNI

ESG

DLR

CVM

Cont

Host

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 27: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Packet CapturingLevel 3/Level 4

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 28: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Sample Packet Capture Tap Points

#NET1777 CONFIDENTIAL 29

DLR TrafficLeaving DLR : debug..vdrport dir input parameters –-vxlan 7006Entering DLR: debug.. vdrport dir output parameters –vxlan 7005

Traffic Encapsulated in VXLANLeaving Host :debug..vmnic vmnic2 dir output parameters --stage 1 --vxlan 7006Entering Host:debug..vmnic vmnic2 dir input parameters --stage 0 --vxlan 7005

VM TrafficLeaving VM : debug.. vnic <vnic_id> dir input parameters Entering VM: debug.. vnic <vnic_id> dir output parameters

All packet capture commands start with

debug packet capture host <host-id> (e.g, host-454)

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 29: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Packet Capture – Follow the Packet

#NET1777 CONFIDENTIAL 30

host-454

VDS

vmk4

host-456

vmk4

VM-A

VXLAN 7005

VM-A IP: 10.145.225.2

VM-B

VTEPVTEP

vMAC = 02:50:56:56:44:52

VXLAN 7006

VM-B IP: 10.145.225.66

@DLR (Ingress)@DLR (Egress)@Src VTEP Uplink @Dest VTEP Uplink

@Dest VM Switchport

@SrcVM Switchport

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 30: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Nsx-mgr-1> debug packet capture display session 133bc131-3563-4222-ac43 parameters -e

reading from file /tmp/pktcap/133bc131-3563-4222-ac43.pcap

Packet Capture – Source VM Switchport

#NET1777 CONFIDENTIAL 31

Session ID

Display Packet Capture

VM-A IP VM-B IP

VM-A MAC vDR MAC

20:02:16.806394 0a:0a:0a:0a:0a:0a > 02:50:56:56:44:52, ethertype IPv4 (0x0800), length 98: 10.145.225.2 > 10.145.225.66: ICMP echo request, id 23815, seq 18550, length 64

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 31: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Support for Automation

#NET1777 CONFIDENTIAL 32

RestAPI(NET1305)

PowerNSX (NET2119)

PyNSXv(MTE4863)

PowerOps (NET2532)

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 32: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

Take Away

#NET1777 CONFIDENTIAL 33

Partner Eco-System

Integration with VMware Products

vSphere Tools

NSX Native Tools

NSX Technology

• Logical Switching

• Distributed + Centralized

Routing

• Load Balancing

• Distributed + Centralized

FW

• and more…

• UI Dashboards

• Flow Monitoring

• Endpoint Monitoring

• Traceflow

• Central CLI

• Packet Capturing

• And more…..

• Netflow

• Esxtop

• Port Mirroring

• Syslog

• Pktcap-uw

• And more…..

• vRealize Log Insight

• vRealize Network Insight

• vRealize Operations

Manager

• And more ..

• EMC Smarts

• Gigamon

• HyTrust

• Tufin

• Riverbed

• AlgoSec

• And more…

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 33: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

References

• NSX-v Operations Guide, rev 1.5: https://communities.vmware.com/docs/DOC-30079

• NSX Troubleshooting Guide: https://docs.vmware.com/en/VMware-NSX-for-vSphere/6.3/nsx_63_troubleshooting.pdf

• NSX Administration Guide:https://docs.vmware.com/en/VMware-NSX-for-vSphere/6.3/nsx_63_admin.pdf

• NSX Command Line Quick Reference:https://docs.vmware.com/en/VMware-NSX-for-vSphere/6.3/com.vmware.nsx.troubleshooting.doc/GUID-18EDB577-1903-4110-8A0B-FE9647ED82B6.html

• Trending Support Issues in NSX for vSphere: https://kb.vmware.com/kb/2131154

#NET1777 CONFIDENTIAL 34

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 34: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 35: NET1777 Troubleshooting Methodology for VMware or distribution€¦ · •This presentation may contain product features that are currently under development. •This overview of

VMworld 2017 Content: Not fo

r publication or distri

bution