Troubleshooting Open Stack

20
EVERYTHING YOU ALWAYS WANTED TO KNOW ABOUT OPENSTACK NETWORK* * But Were Afraid to Ask AKA Openstack debugging VLAN setup Disclaimer Here is a tentative guide to test and debug mostly the networking in the Openstack cloud world. We have spent huge amount of time looking at packet dumps in order to distill this information for you in the belief that, following the recipes outlined in the following pages, you will have an easier time ! Keep in mind that this is coming more from a day by day debug than from a structured plan so I tried to separate the pieces according to the architecture that I have in mind... but is and will remain a work in progress. Reference setup: The setup is the following: 1. compute node Ubuntu server 14 4 ethernet interfaces mapped on em14 (3 used) 2. controller compute node Ubuntu server 14 4 ethernet interfaces mapped on em14 (3 used) 3. network node Ubuntu server 14 4 ethernet interfaces mapped on em14 (3 used) The networking configuration is implemented within neutron service and based on a VLAN approach so to obtain a completly L2 separation of a multiple tenant environment. Follow the openstack guide to configure the services (in appendix the configuration files that has been used in this case and few configuration scripts). Preliminary checks Once you agreed with your network administrators on the switches configuration (If you have no direct access to them) let's double check the port configuration for the vlan ids: Capture an LLDP packet (0x88cc) from each host and for each interface:

description

Troubleshooting openstack neutron services

Transcript of Troubleshooting Open Stack

  • EVERYTHING YOU ALWAYS WANTED TO KNOW ABOUT OPENSTACK NETWORK* * But Were Afraid to Ask

    AKA Openstack debugging VLAN setup

    Disclaimer HereisatentativeguidetotestanddebugmostlythenetworkingintheOpenstackcloudworld.

    Wehavespenthugeamountoftimelookingatpacketdumpsinordertodistillthisinformationforyouinthebeliefthat,followingtherecipesoutlinedinthefollowingpages,youwillhaveaneasiertime!

    KeepinmindthatthisiscomingmorefromadaybydaydebugthanfromastructuredplansoItriedtoseparatethepiecesaccordingtothearchitecturethatIhaveinmind...butisandwillremainaworkinprogress.

    Reference setup: Thesetupisthefollowing:

    1. computenodeUbuntuserver144ethernetinterfacesmappedonem14(3used)2. controllercomputenodeUbuntuserver144ethernetinterfacesmappedonem14(3used)3. networknodeUbuntuserver144ethernetinterfacesmappedonem14(3used)

    The networking configuration is implemented within neutron service and based on a VLAN approach so to obtain acompletlyL2separationofamultipletenantenvironment.

    Follow the openstack guide to configure the services (in appendix the configuration files that has been used in this caseandfewconfigurationscripts).

    Preliminary checks Once you agreed with your network administrators on the switches configuration (If you have no direct access to them)let'sdoublechecktheportconfigurationforthevlanids:

    CaptureanLLDPpacket(0x88cc)fromeachhostandforeachinterface:

  • #tcpdumpvvvs1500etherproto0x88cciem1

    (waitforapacketandthenCTRLc)

    this command will give you some information about the switch that you are connected to and the VLAN configuration.NBiftheportisintrunkyoumaygetthesameresultasiftheportiswithoutVLANsettings.

    Anexampleoftheoutputofthecommandforaninterfaceattachedtoaportthatisconfiguredasaccess:

    tcpdump:WARNING:em1:noIPv4addressassigned

    tcpdump:listeningonem1,linktypeEN10MB(Ethernet),capturesize1500bytes

    12:33:03.255101LLDP,length351

    [...]

    SystemNameTLV(5),length13:stackdr2.GARR

    0x0000:737461636b6472322e47415252

    [...]

    PortDescriptionTLV(4),length21:GigabitEthernet2/0/31

    [...]

    OrganizationspecificTLV(127),length6:OUIEthernetbridged(0x0080c2)

    PortVLANIdSubtype(1)

    portvlanid(PVID):320

    [...]

    1packetcaptured

    1packetreceivedbyfilter

    0packetsdroppedbykernel

    anexampleoftheoutputofthecommandforaninterfaceattachedtoaportthatisconfiguredastrunk:

    #tcpdumpvvvs1500etherproto0x88cciem3

    tcpdump:WARNING:em3:noIPv4addressassigned

    tcpdump:listeningonem3,linktypeEN10MB(Ethernet),capturesize1500bytes

    12:32:11.513135LLDP,length349

    [...]

    SystemNameTLV(5),length13:stackdr2.GARR

    [...]

    PortDescriptionTLV(4),length20:GigabitEthernet2/0/3

    [...]

  • PortVLANIdSubtype(1)

    portvlanid(PVID):1

    [...]^C

    1packetcaptured

    1packetreceivedbyfilter

    0packetsdroppedbykernel

    Check Interfaces On compute nodes, use the following command to see information about interfaces: IPs, VLAN ids and to know wethertheinterfacesareup:

    #ipa

    onegoodinitialsanitycheckistomakesurethatyourinterfacesareup:

    #ipa|grepem[1,3]|grepstate

    2: em3: mtu 1500 qdisc mq master ovssystem state UP groupdefaultqlen1000

    6: em1: mtu 1500 qdisc mq state UP group default qlen 1000

    37:brem3:mtu1500qdiscnoqueuestateUNKNOWNgroupdefault

    Troubleshooting Open vSwitch Open vSwitch is a multilayer virtual switch. Full documentation can be found at the website. In practice you need to ensure that the required bridges (brint, brex, brem1, brem3 etc) exist and have the proper ports connected to themwiththeovsvsctlandovsofctlcommands.

    Tolistthebridgesonasystem(VLANnetworksaretrunkedthroughtheem3networkinterface):

    #ovsvsctllistbrbrem3brexbrint

    Example:onthenetworknode(youshouldfollowthesamelogiconthecomputeone)

    Lets check the chain of ports and bridges. The bridge brem3 contains the physical network interface em3 (trunk network)andthevirtualinterfacephybrem3attachedtotheintbrem3ofthebrint:

    #ovsvsctllistportsbrem3em3phybrem3#ovsvsctlshowBridge"brem3"Port"em3"Interface"em3"Port"phybrem3"Interface"phybrem3"type:patch

  • options:{peer="intbrem3"}Port"brem3"Interface"brem3"type:internalbrint contains intbrem3 which pairs with phybrem3 to connect to the physical network which is used to connect to the compute nodes and the TAP devices that connect to the DHCP instances and the Tap interfaces that connects tothevirtualrouters:

    #ovsvsctllistportsbrintintbrem3intbrexqr9ae4acd492qrae75168a67qre323976e2bqre3debf8deetap1474f18da9tap7c29ce274etapc974ab5325tapd9762af34b#ovsvsctlshowBridgebrintfail_mode:securePort"tapd9762af34b"tag:5Interface"tapd9762af34b"type:internalPortintbrexInterfaceintbrextype:patchoptions:{peer=phybrex}[...]

    Port"qr9ae4acd492"tag:1Interface"qr9ae4acd492"type:internalPortbrintInterfacebrinttype:internalPort"tap1474f18da9"tag:3Interface"tap1474f18da9"type:internal#ovsvsctllistportsbrexBridgebrexPortbrexInterfacebrextype:internalPort"em4"Interface"em4"PortphybrexInterfacephybrextype:patchoptions:{peer=intbrex}Ifanyoftheselinksismissingorincorrect,itsuggestsaconfigurationerror.

    NB: you can also check the correct vlan tags translation along the overall chain with ovsofctl commands i.e. (more detailsfollows)

    #ovsofctldumpflowsbrint NXST_FLOWreply(xid=0x4):

  • cookie=0x0, duration=6718.658s, table=0, n_packets=0, n_bytes=0, idle_age=6718, priority=3,in_port=1,dl_vlan=325actions=mod_vlan_vid:4,NORMAL

    cookie=0x0, duration=6719.335s, table=0, n_packets=0, n_bytes=0, idle_age=6719, priority=3,in_port=1,dl_vlan=327actions=mod_vlan_vid:3,NORMAL

    cookie=0x0, duration=6720.508s, table=0, n_packets=3, n_bytes=328, idle_age=6715, priority=3,in_port=1,dl_vlan=328actions=mod_vlan_vid:1,NORMALcookie=0x0, duration=5840.156s, table=0, n_packets=139, n_bytes=13302, idle_age=972,

    priority=3,in_port=1,dl_vlan=320actions=mod_vlan_vid:5,NORMALcookie=0x0, duration=6719.906s, table=0, n_packets=58, n_bytes=6845, idle_age=6464,

    priority=3,in_port=1,dl_vlan=324actions=mod_vlan_vid:2,NORMALcookie=0x0, duration=6792.845s, table=0, n_packets=555, n_bytes=100492, idle_age=9,

    priority=2,in_port=1actions=dropcookie=0x0, duration=6792.025s, table=0, n_packets=555, n_bytes=97888, idle_age=9,

    priority=2,in_port=2actions=dropcookie=0x0, duration=6793.667s, table=0, n_packets=203, n_bytes=22402, idle_age=4535,

    priority=1actions=NORMALcookie=0x0, duration=6793.605s, table=23, n_packets=0, n_bytes=0, idle_age=6793,

    priority=0actions=drop

    Bridges can be added with ovs-vsctl add-br, and ports can be added to bridges with ovs-vsctl add-port.

    Troubleshoot neutron traffic Refer to the Cloud Administrator Guide for a variety of networking scenarios and their connection paths. We use theOpenvSwitch(OVS)backend.

    Seethefollowingfigureforreference.

    1. TheinstancegeneratesapacketandsendsitthroughthevirtualNICinsidetheinstance,suchaseth0.2. ThepackettransferstoaTestAccessPoint(TAP)deviceonthecomputehost,suchastap1d40b89cfe.

    YoucanfindoutwhatTAPisbeingusedbylookingatthe/etc/libvirt/qemu/instancexxxxxxxx.xmlfile.

    followinganexamplewiththeinterestingpartsinevidence:

    instance00000015cc2b78766d3a4b78b817ed36146a9b9e[....]

  • fig: Neutron network paths see here for more details at the networking scenarios chapter

    Looking also at the neutron part and highlighting the VLAN configuration we have something like (I recycled the image so the breth1 is bremXX in my setup and ethYY are emZZ but the flow is the point that I want to stress here):

  • 1. The TAP device is connected to the integration bridge, brint. This bridge connects all the instance TAP devices and any other bridges on the system. intbreth1 is one half of a veth pair connecting to the bridge breth1,whichhandlesVLANnetworkstrunkedoverthephysicalEthernetdeviceeth1.

    2. The TAP devices and veth devices are normal Linux network devices and may be inspected with the usual tools, such as ip and tcpdump. Open vSwitch internal devices are only visible within the Open vSwitch environment.

    #tcpdumpiintbrem3tcpdump:intbrem3:Nosuchdeviceexists(SIOCGIFHWADDR:Nosuchdevice)

    3. To watch packets on internal interfaces you need to create a dummy network device and add it to the bridge containing the internal interface you want to snoop on. Then tell Open vSwitch to mirror all traffic to or from the internal port onto this dummy port so to run tcpdump on the dummy interface and see the trafficontheinternalport.

    4. Capture packets from an internal interface on integration bridge, br-int (advanced):

    1. Createandbringupadummyinterface,snooper0:2. #iplinkaddnamesnooper0typedummy3. #iplinksetdevsnooper0up

    4. Add device snooper0 to bridge br-int:

    #ovsvsctladdportbrintsnooper0

    5. Create mirror of for example int-br-em3 interface to snooper0 (all in one line - returns UUID of mirror port):

  • # ovsvsctl set Bridge brint mirrors=@m id=@snooper0 get Port snooper0 id=@intbrem3getPortintbrem3 id=@mcreateMirror name=mymirror selectdstport=@intbrem3 selectsrcport=@intbrem3 outputport=@snooper0

    dcce2c59be1a4f2db00b9d906c77ee8a

    6. and from here you can see the traffic going through int-br-em3 with a tcpdumpisnooper0.

    7. Clean up mirrors: #ovsvsctlclearBridgebrintmirrors#ovsvsctldelportbrintsnooper0#iplinkdeletedevsnooper0

    On the integration bridge, networks are distinguished using internal VLAN ids (unrelated to the segmentation IDs used in the network definition and on the physical wire) regardless of how the networking service defines them. This allows instances on the same host to communicate directly without transiting the rest of the virtual, or physical, network. On the brint, incoming packets are translated from external tags to internal tags. Other translationsalsohappenontheotherbridgesandwillbediscussedlater.

    5. To discover which internal VLAN tag is in use for a given external VLAN by using the ovsofctl command:

    1. FindtheexternalVLANtagofthenetworkyou'reinterestedinwith

    #neutronnetshowfieldsprovider:segmentation_id+++|Field|Value|+++|provider:network_type|vlan||provider:segmentation_id|324|+++

    2. Grep for the provider:segmentation_id, 324 in this case, in the output of ovsofctl dumpflows brint:

    #ovsofctldumpflowsbrint|grepvlan=324

    cookie=0x0,duration=105039.122s,table=0,n_packets=5963,n_bytes=482203,idle_age=1104,hard_age=65534,priority=3,in_port=1,dl_vlan=324actions=mod_vlan_vid:1,NORMAL

    3. Here you can see packets received on port ID 1 with the VLAN tag 324 are modified to have the internalVLANtag1.Diggingalittledeeper,youcanconfirmthatport1isinfact:

    4. #ovsofctlshowbrintOFPT_FEATURES_REPLY(xid=0x2):dpid:0000029a51549b40n_tables:254,n_buffers:256capabilities:FLOW_STATSTABLE_STATSPORT_STATSQUEUE_STATSARP_MATCH_IPactions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRCSET_NW_DSTSET_NW_TOSSET_TP_SRCSET_TP_DSTENQUEUE1(intbrem3):addr:52:40:bd:b3:88:9cconfig:0state:0speed:0Mbpsnow,0Mbpsmax2(qvof3b63d31a0):addr:4e:db:74:04:53:4dconfig:0state:0

  • current:10GBFDCOPPERspeed:10000Mbpsnow,0Mbpsmax3(qvo65fb5ad8b5):addr:92:75:b8:03:cc:1dconfig:0state:0current:10GBFDCOPPERspeed:10000Mbpsnow,0Mbpsmax4(qvoa6e8c6e31c):addr:82:22:71:c5:4e:f8config:0state:0current:10GBFDCOPPERspeed:10000Mbpsnow,0Mbpsmax5(qvo1d40b89cfe):addr:5e:e3:15:53:e5:16config:0state:0current:10GBFDCOPPERspeed:10000Mbpsnow,0Mbpsmax6(qvoff8e411e6e):addr:02:a9:38:d6:88:22config:0state:0current:10GBFDCOPPERspeed:10000Mbpsnow,0MbpsmaxLOCAL(brint):addr:02:9a:51:54:9b:40config:0state:0speed:0Mbpsnow,0MbpsmaxOFPT_GET_CONFIG_REPLY(xid=0x4):frags=normalmiss_send_len=0

    5. (NB this is NOT valid if you are using a GRE tunnel) VLANbased networks exit the integration

    bridge via a veth interface i.e. intbrem3 (intbreth1 in the picture) and arrive on a bridge i.e. brem3 (breth1) on the other member of the veth pair phybrem3 (phybreth1). Packets on this interface arrive with internal VLAN tags and are translated to external tags in the reverse of the processdescribedabove:

    #ovsofctldumpflowsbrem3|grep324

    cookie=0x0,duration=105402.89s,table=0,n_packets=7374,n_bytes=905197,idle_age=1468,hard_age=65534,priority=4,in_port=2,dl_vlan=1actions=mod_vlan_vid:324,NORMAL

    6. Packets, now tagged with the external VLAN tag, then exit onto the physical network via em3 (eth1). The Layer2 switch this interface is connected to must be configured as trunk on the VLANIDsused.Thenexthopforthispacketmustalsobeonthesamelayer2network.

    6. The packet is then received on the network node. Note that any traffic to the l3agent or dhcpagent will be visible only within their network namespace. Watching any interfaces outside those namespaces, even those that carry the network traffic, will only show broadcast packets like Address Resolution Protocols (ARPs), but unicast traffic to the router or DHCP address will not be seen. See Dealing with Network Namespacesfordetailonhowtoruncommandswithinthesenamespaces.

    7. Alternatively, it is possible to configure VLANbased networks to use external routers rather than the l3agentshownhere,solongastheexternalrouterisonthesameVLAN:

    1. VLANbased networks are received as tagged packets on a physical network interface, eth1 in thisexample.Justasonthecomputenode,thisinterfaceisamemberofthebreth1bridge.

    2. GREbased networks will be passed to the tunnel bridge brtun, which behaves just like the GRE interfacesonthecomputenode.

    8. Next,thepacketsfromeitherinputgothroughtheintegrationbridge,againjustasonthecomputenode.9. The packet then makes it to the l3agent. This is actually another TAP device within the router's network

    namespace. Router namespaces are named in the form qrouter. Running ip a within the namespacewillshowtheTAPdevicename,qre6256f7d31inthisexample:

    10. #ipnetnsexecqroutere521f9d0a1bd4ff4bc8178a60dd88fe5ipa|grepstate10:qre6256f7d31:mtu1500qdiscnoqueue\stateUNKNOWN

  • 11:qg35916e1f36:mtu1500\qdiscpfifo_faststateUNKNOWNqlen50028:lo:mtu16436qdiscnoqueuestateUNKNOWN

    11. The qg interface in the l3agent router namespace sends the packet on to its next hop through device eth2 on the external bridge brex. This bridge is constructed similarly to breth1 and may be inspected in thesameway.

    12. This external bridge also includes a physical network interface, eth2 in this example, which finally lands thepacketontheexternalnetworkdestinedforanexternalrouterordestination.

    13. DHCP agents running on OpenStack networks run in namespaces similar to the l3agents. DHCP namespaces are named qdhcp and have a TAP device on the integration bridge. Debugging of DHCPissuesusuallyinvolvesworkinginsidethisnetworknamespace.

    Debug a problem along the Path Pingisyourbestfriend!Fromaninstance:

    1. Seewhetheryoucanpinganexternalhost,suchas8.8.8.8(googlewhichusuallyisup:fromstats99.9%).2. Ifyoucan't,trytheIPaddressofthecomputenodewherethevirtualmachineishosted.3. If you can ping this IP, then the problem is somewhere between the compute node and that compute

    node'sgateway.4. If you can't the problem is between the instance and the compute node. Check also the bridge connecting

    thecomputenode'smainNICwiththevnetNICofthevm.5. Launch a second instance and see whether the two instances can ping each other. If they can, the issue

    mightberelatedtothefirewallonthecomputenode.Seefurtherforiptablesdebugging

    tcpdump This is your second best friend to help with troubleshooting network issues. Using tcpdump at several points along thenetworkpathshouldhelpfindingwheretheproblemis.

    Forexample,runthefollowingcommand:

    tcpdumpianynv\'icmp[icmptype]=icmpechoreplyoricmp[icmptype]=icmpecho'

    on:

    1. Anexternalserveroutsideofthecloud(intheexample193.206.159.201)2. Acomputenode()3. Aninstancerunningonthatcomputenode

    Inthisexample,theselocationshavethefollowingIPaddresses:

    Instance10.0.2.24203.0.113.30ComputeNode10.0.0.42203.0.113.34ExternalServer1.2.3.4

    Next, open a new shell to the instance and then ping the external host where tcpdump is running. If the network pathtotheexternalserverandbackisfullyfunctional,youseesomethinglikethefollowing:

    On the external server: $tcpdumpianynv\'icmp[icmptype]=icmpechoreplyoricmp[icmptype]=icmpecho'tcpdump:listeningonany,linktypeLINUX_SLL(Linuxcooked),capturesize65535bytes10:20:23.517242 IP (tos 0x0, ttl 64, id 65416, offset 0, flags [none], proto ICMP (1), length84)193.206.159.201>90.147.91.10:ICMPechoreply,id1606,seq28,length64

  • which received the ping request and sent a ping replyOn the compute node you can follow the traffic along the path:

    1. onthetapdevicewhichisconnectingtheVMtothelinuxbridge(tofindthetapseeprevious)

    # tcpdump i tap88ab3af77d n v \ 'icmp[icmptype] = icmpechoreply or icmp[icmptype]=icmpecho'tcpdump:WARNING:tap88ab3af77d:noIPv4addressassignedtcpdump: listening on tap88ab3af77d, linktype EN10MB (Ethernet), capture size 65535bytes10:36:31.000419IP(tos0x0,ttl64,id1469,offset0,flags[DF],protoICMP(1), length84)192.168.4.103>8.8.8.8:ICMPechorequest,id1709,seq1,length64

    2. onthetwosidesofthevethpairbetweenthelinuxbridgeandtheOVSbrint

    # tcpdump i qbr88ab3af77d n v \ 'icmp[icmptype] = icmpechoreply or icmp[icmptype]=icmpecho'tcpdump:WARNING:qbr88ab3af77d:noIPv4addressassignedtcpdump: listening on qbr88ab3af77d, linktype EN10MB (Ethernet), capture size 65535bytes10:36:59.035767IP(tos0x0,ttl64,id1497,offset0,flags[DF],protoICMP(1), length84)192.168.4.103>8.8.8.8:ICMPechorequest,id1709,seq29,length64root@compute:~# tcpdump i qvb88ab3af77dnv\'icmp[icmptype]=icmpechoreply oricmp[icmptype]=icmpecho'tcpdump:WARNING:qvb88ab3af77d:noIPv4addressassignedtcpdump: listening on qvb88ab3af77d, linktype EN10MB (Ethernet), capture size 65535bytes10:37:18.058899IP(tos0x0,ttl64,id1516,offset0,flags[DF],protoICMP(1), length84)192.168.4.103>8.8.8.8:ICMPechorequest,id1709,seq48,length64

    3. andfinallyontheoutgoinginterface(em1intheexample)

    #tcpdumpiem1nv\'icmp[icmptype]=icmpechoreplyoricmp[icmptype]=icmpecho'tcpdump:WARNING:em1:noIPv4addressassignedtcpdump:listeningonem1,linktypeEN10MB(Ethernet),capturesize65535bytes10:37:49.099383IP(tos0x0,ttl64,id1547,offset0,flags[DF],protoICMP(1), length84)192.168.4.103>8.8.8.8:ICMPechorequest,id1709,seq79,length64

    Ontheinstance:

    #tcpdumpianynv\'icmp[icmptype]=icmpechoreplyoricmp[icmptype]=>icmpecho'tcpdump:listeningonany,linktypeLINUX_SLL(Linuxcooked),capturesize65535bytes09:27:04.801759IP(tos0x0,ttl64,id36704,offset0,flags[DF],protoICMP(1),length 84)192.168.4.103>192.168.21.107:ICMPechorequest,id1693,seq27,length64

    NBitcanbeusefultoshowvlantagintrafficdebugging.Todothisuse:

    #tcpdumpiUw|tcpdumpenrvlan

  • iptables and security rules OpenStack Compute automatically manages iptables, including forwarding packets to and from instances on a computenode,forwardingfloatingIPtraffic,andmanagingsecuritygrouprules.

    iptablessave

    showsyoualltherules

    Example of setup of security rules toshowthesecurityrules:

    #novasecgrouplistrulesdefault++++++|IPProtocol|FromPort|ToPort|IPRange|SourceGroup|++++++|||||default||||||default|++++++

    tosetuparuletomakeicmptrafficpassthrough:

    novasecgroupaddruledefaulticmp110.0.0.0/0

    ++++++|IPProtocol|FromPort|ToPort|IPRange|SourceGroup|++++++|icmp|1|1|0.0.0.0/0|||||||default||||||default|++++++

    Troubleshooting DNS SSH server does a reverse DNS lookup on the IP address that you are connecting from so if you can use SSH to log intoaninstance,butittakesorderofaminutethenyoumighthaveaDNSissue.

    A quick way to check whether DNS is working is to resolve a hostname inside your instance by using the host command.IfDNSisworking,youshouldsee:

    #hostgarr.itgarr.itmailishandledby15lx1.dir.garr.it.garr.itmailishandledby20lx5.dir.garr.it.

    Note If you're running the Cirros image, it doesn't have the "host" program installed, in which case you can use pingtotrytoaccessamachinebyhostnametoseewhetheritresolves.

    Dealing with Network Namespaces Linux network namespaces are a kernel feature the networking service uses to support multiple isolated layer-2 networks with overlapping IP address ranges. Your network nodes will run their dhcp-agents and l3-agents in isolated namespaces. NB Network interfaces

  • and traffic on those interfaces will not be visible in the default namespace. L3-agent router namespaces are named qrouter-, and dhcp-agent name spaces are named qdhcp-. To see whether you are using namespaces, run ip netns: #ipnetnsqrouter80fdf88437c34d33a340cd1a09510e59qdhcpc3cfc51bf07c47aebdb4b029035c08d7qdhcpf7bff0561d274c12a9176ffe2925a44bqrouteredcb7cb537fd4b3181c5cee1bda75369qdhcp286f28446b7642e59664ab5123bde2d5qrouter3618b0204f3c4a728c02e25db0c4769dqdhcpc8a29266e9ac45e0be6d79c32f501194qrouter301f264a8ef1413db252c0886fc2c815qrouter9d378195ee9345f0b27f2bd48b774f5aqdhcp13c334c1ad394c51b396953430059b22

    This output shows a network node with 5 networks running dhcp-agents, each also running an l3-agent router. A list of existing networks and their UUIDs can be obtained by running neutronnetlist with administrative credentials. #neutronnetlist

    ++++

    |id|name|subnets|

    ++++

    |13c334c1ad394c51b396953430059b22|intnet324|edd7678a277c477ea5ac84258e6b1794192.168.1.0/24|

    |286f28446b7642e59664ab5123bde2d5|inafnet|dbf5bd19de674b84a97b8e322f9343dc192.168.3.0/24|

    |99e9c208b72a427f97f62443cdd6de9c|extnetflat319|e0ef8d6f3fa94a05ae2c5ec229357f4b90.147.91.0/24|

    |b4ef2523bebe4dbeb5b782983fec6be8|extnetflat319bis|91ccda542af14a59bf088bb0821c1c0890.147.91.0/24|

    |c3cfc51bf07c47aebdb4b029035c08d7|intnet328|0d36feb34c834867a227fb972564125c192.168.8.0/24|

    |c8a29266e9ac45e0be6d79c32f501194|ingvnet|915f9929e49b4a95a193c71227ff870d192.168.2.0/24|

    |f7bff0561d274c12a9176ffe2925a44b|eneanet|d9d1ba304a144aaba95f4ed2c3f895d3192.168.4.0/24|

    Once you've determined which namespace you need to work in, you can use any of the debugging tools mention earlier by prefixing the command with ip netns exec . For example, to see what network interfaces exist in the first qdhcp namespace returned above, do this: #ipnetnsexecqdhcpf7bff0561d274c12a9176ffe2925a44bipa1:lo:mtu65536qdiscnoqueuestateUNKNOWNgroupdefault

  • link/loopback00:00:00:00:00:00brd00:00:00:00:00:00inet127.0.0.1/8scopehostlovalid_lftforeverpreferred_lftforeverinet6::1/128scopehostvalid_lftforeverpreferred_lftforever61: tapd9762af34b: mtu 1500 qdisc noqueue state UNKNOWN group defaultlink/etherfa:16:3e:b8:2e:0cbrdff:ff:ff:ff:ff:ffinet192.168.4.100/24brd192.168.4.255scopeglobaltapd9762af34bvalid_lftforeverpreferred_lftforeverinet6fe80::f816:3eff:feb8:2e0c/64scopelinkvalid_lftforeverpreferred_lftforever From this you see that the DHCP server on that network is using the tapd9762af3-4b device and has an IP address of 192.168.4.100. Usual commands also mentioned previously can be run in the same way. note: It is also possible to run a shell and have an interactive session within the namespace i.e. #ipnetnsexecqdhcpf7bff0561d274c12a9176ffe2925a44bbashroot@network:~#ifconfigloLinkencap:LocalLoopbackinetaddr:127.0.0.1Mask:255.0.0.0inet6addr:::1/128Scope:HostUPLOOPBACKRUNNINGMTU:65536Metric:1RXpackets:0errors:0dropped:0overruns:0frame:0TXpackets:0errors:0dropped:0overruns:0carrier:0collisions:0txqueuelen:0RXbytes:0(0.0B)TXbytes:0(0.0B)tapd9762af34bLinkencap:EthernetHWaddrfa:16:3e:b8:2e:0cinetaddr:192.168.4.100Bcast:192.168.4.255Mask:255.255.255.0inet6addr:fe80::f816:3eff:feb8:2e0c/64Scope:LinkUPBROADCASTRUNNINGMTU:1500Metric:1RXpackets:22errors:0dropped:0overruns:0frame:0TXpackets:9errors:0dropped:0overruns:0carrier:0collisions:0txqueuelen:0RXbytes:1788(1.7KB)TXbytes:738(738.0B)

    Mapping of physnet vs network inside neutron db Sometimes there could be an unclear (from the logs point of view) error that claims not to find suitable resourcesatthemomentofVMcreation.ItcouldberelatedtoaproblemintheneutronDB.Tofindout:

    1. checkthatnovaservicesarerunningonthecomputenodesandcontroller

    #novaservicelist+++++++++| Id | Binary |Host |Zone |Status |State|Updated_at |DisabledReason|+++++++++| 1 | novacompute | compute | nova | enabled | up | 20150212T13:52:45.000000||| 2 | novacert | controller | internal | enabled | up |

  • 20150212T13:52:40.000000||| 3 | novaconsoleauth | controller | internal | enabled | up | 20150212T13:52:40.000000||| 4 | novascheduler | controller | internal | enabled | up | 20150212T13:52:45.000000||| 5 | novaconductor | controller | internal | enabled | up | 20150212T13:52:44.000000||| 6 | novacompute | controller | nova | enabled | up | 20150212T13:52:46.000000||+++++++++

    2. checkthatthereareenoughhwresources

    #novahypervisorstats

    +++

    |Property|Value|

    +++

    |count|2|

    |current_workload|0|

    |disk_available_least|1130|

    |free_disk_gb|1274|

    |free_ram_mb|367374|

    |local_gb|1454|

    |local_gb_used|180|

    |memory_mb|386830|

    |memory_mb_used|19456|

    |running_vms|6|

    |vcpus|80|

    |vcpus_used|9|

    +++

    3. check that there is no problem in the mapping of physnet and networks in the neutron db (i.e. trunknetisourvlantaggednetwork)

    select*fromml2_vlan_allocations

    ++++

    |physical_network|vlan_id|allocated|

    ++++

    |trunknet | 319| 0|

    |trunknet | 320| 0|

  • |trunknet | 321| 0|

    |trunknet | 322| 0|

    |trunknet | 323| 0|

    |trunknet | 324| 0|

    |trunknet | 325| 0|

    |trunknet | 326| 0|

    |trunknet | 327| 0|

    |trunknet | 328| 0|

    ++++32

    Debugging with logs - where Are the Logs? Following reported a quick summary table of the services log location, more in OpenStack log locations.

    Node type Service Log location

    Cloud controller

    nova-* /var/log/nova

    Cloud controller

    glance-* /var/log/glance

    Cloud controller

    cinder-* /var/log/cinder

    Cloud controller

    keystone-* /var/log/keystone

    Cloud controller

    neutron-* /var/log/neutron

    Cloud controller

    horizon /var/log/apache2/

    All nodes misc (swift, dnsmasq) /var/log/syslog

    Compute nodes

    libvirt /var/log/libvirt/libvirtd.log

  • Compute nodes

    Console (boot up messages) for VM instances:

    /var/lib/nova/instances/instance-/console.log

    Block Storage nodes

    cinder-volume /var/log/cinder/cinder-volume.log

    Backup + Recovery (for Real) This chapter describes only how to back up configuration files and databases that the various OpenStack components need to run. This chapter does not describe how to back up objects inside Object Storage or data containedinsideBlockStorage.

    Database Backups The cloud controller is the MySQL server hosting the databases for nova, glance, cinder, and keystone. To create a databasebackup:

    #mysqldumpuhcontrollerpalldatabases>openstack.sql

    To backup a single database (i.e. nova) you can run: #mysqldumpuhcontrollerpnova>nova.sql

    You can easily automate this process. The following script dumps the entire MySQL database and deletes any backupsolderthansevendays:

    #!/bin/bashbackup_dir="/var/lib/backups/mysql"filename="${backup_dir}/mysql`hostname` `evaldate+%Y%m%d`.sql.gz"#DumptheentireMySQLdatabase/usr/bin/mysqldumpurootp123gridalldatabases|gzip>$filename#Deletebackupsolderthan7daysfind$backup_dirctime+7typefdelete

    File System Backups

    Compute

    The/etc/novadirectoryonboththecloudcontrollerandcomputenodesshouldbebackedup.

    /var/lib/novaisadirectorytobackup.

    note: its unuseful to backup /var/lib/nova/instances subdirectory on compute nodes which contains the KVM imagesofrunninginstancesunlessyouneedtomaintainbackupcopiesofallinstances.

    ImageCatalogandDelivery

    /etc/glanceand/var/log/glanceshouldbebackedup

    /var/lib/glanceshouldalsobebackedup.

  • There are two ways to ensure stability with this directory. The first is to make sure this directory is run on a RAID array. If a disk fails, the directory is available. The second way is to use a tool such as rsync to replicate the images toanotherserver:

    #rsyncazprogress/var/lib/glance/imagesbackupserver:/var/lib/glance/images/

    Identity

    /etc/keystoneand/var/log/keystonefollowthesamerulesasothercomponents.

    /var/lib/keystone,shouldnotcontainanydatabeingused.

    Recovering Backups Recoveringbackupsisasimpleprocess.

    1. ensurethattheserviceyouarerecoveringisnotrunning.I.e.inthecaseofnova:

    #stopnovacert#stopnovaconsoleauth#stopnovanovncproxy#stopnovaobjectstore#stopnovascheduler

    2. importapreviouslybackedupdatabase:

    #mysqlurootponedatabaseneutron

  • 1. ThedashboardorCLIgetstheusercredentialsandauthenticateswiththeIdentityServiceviaRESTAPI.2. The Identity Service authenticates the user with the user credentials, and then generates and sends back an

    authtokenwhichwillbeusedforsendingtherequesttoothercomponentsthroughRESTcall.3. The dashboard or CLI converts the new instance request specified in launch instance or novaboot form to

    aRESTAPIrequestandsendsittonovaapi.4. novaapi receives the request and sends a request to the Identity Service for validation of the authtoken

    andaccesspermission.5. The Identity Service validates the token and sends updated authentication headers with roles and

    permissions.6. novaapichecksforconflictswithnovadatabase.7. novaapicreatesinitialdatabaseentryforanewinstance.8. novaapi sends the rpc.call request to novascheduler expecting to get updated instance entry with host ID

    specified.9. novaschedulerpicksuptherequestfromthequeue.10. novaschedulerinteractswithnovadatabasetofindanappropriatehostviafilteringandweighing.11. novascheduler returns the updated instance entry with the appropriate host ID after filtering and

    weighing.12. novascheduler sends the rpc.cast request to novacompute for launching an instance on the appropriate

    host.13. novacomputepicksuptherequestfromthequeue.14. novacompute sends the rpc.call request to novaconductor to fetch the instance information such as host

    IDandflavor(RAM,CPU,Disk).15. novaconductorpicksuptherequestfromthequeue.16. novaconductorinteractswithnovadatabase.17. novaconductorreturnstheinstanceinformation.18. novacomputepicksuptheinstanceinformationfromthequeue.19. novacompute performs the REST call by passing the authtoken to glanceapi. Then, novacompute uses

    the Image ID to retrieve the Image URI from the Image Service, and loads the image from the image

  • storage.20. glanceapivalidatestheauthtokenwithkeystone.21. novacomputegetstheimagemetadata.22. novacompute performs the RESTcall by passing the authtoken to Network API to allocate and

    configurethenetworksothattheinstancegetstheIPaddress.23. neutronservervalidatestheauthtokenwithkeystone.24. novacomputeretrievesthenetworkinfo.25. novacompute performs the REST call by passing the authtoken to Volume API to attach volumes to the

    instance.26. cinderapivalidatestheauthtokenwithkeystone.27. novacomputeretrievestheblockstorageinfo.28. novacompute generates data for the hypervisor driver and executes the request on the hypervisor (via

    libvirtorAPI).

    Configuration options

    Allthedetailsaboutconfigurationoptionscanbefoundherehttp://docs.openstack.org/juno/configreference/content/index.html

    anycommentismorethanwelcome

    Alex