Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a...

20
Jeronimo Bezerra Florida Interna1onal University <[email protected]> Internet2 Technology Exchange Miami, Sep 26 th 2016 Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment Marcos Schwarz Rede Nacional de Ensino e Pesquisa <[email protected]>

Transcript of Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a...

Page 1: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

JeronimoBezerraFloridaInterna1onalUniversity

<[email protected]>

Internet2TechnologyExchangeMiami,Sep26th2016

Troubleshoo>ngAmLight:HandlingNetworkEventsinaProduc>onSDNEnvironment

MarcosSchwarzRedeNacionaldeEnsinoePesquisa

<[email protected]>

Page 2: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

Outline

•  Introduc>ontoAmLight•  RFC7426:SDNTerminology•  Testspre-produc>on•  SDNTopologies•  Whatshouldbemonitored?– ControlPlaneMonitoring– DataPlaneMonitoring

•  Future

2

Page 3: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

AmLightisaDistributedAcademicExchangePoint•  Produc>onSDNInfrastructuresinceAug2014

•  PartnershipinvolvingFIU,NSF,ANSP,RNP,RedClaraandAURA•  Connectstwoacademicexchangepoints:AMPATH/MiamiandSouthernLight/Brazil•  CarriesAcademicandNon-Academic/Commercialtraffic

–  L2VPN,IPv4,IPv6,Mul>cast•  SupportsNetworkProgrammability/Slicing

–  OpenFlow1.0–  FlowSpaceFirewallforNetworkProgrammability/Slicing–  OESSforL2VPNs–  OGFNetworkServiceInterface(NSI)enabled–  ONOS/SDN-IPforAcademicIPv4–  Currently5slicesforexperimenta>on(includingGlobalONOSSDN-IP)

•  Currently,opera>ngwithmorethan800flows(produc>onandexperimenta>on)•  Website:www.sdn.amlight.net

3

Page 4: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

AmLightSDNStack

4

NSI

AmLight’sNRENs

FIBRESDN-IPONOS

SouthernLightAmpath2

Virtualization/Slices (FlowSpace Firewall)

Ampath1Andes1

Phys

ical L

ayer

Sout

hbou

nd AP

I:Op

enFlo

w 1.0

North

boun

d:Us

ers’

APIs

NOX

IDCP

Other NRENs

NOX

OpenNSA

OESS

OSCARS

OESS

Andes2

Univ.Twente

ONOS Internet2

Other Testbeds

Page 5: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

SDN:LayersandArchitectureTerminology•  Thispresenta>onwillusethe

SDNterminologystandardizedthroughIETFRFC7426:– Fourplanes:

•  Applica>on,Control,ForwardingPlane&ManagementPlanes

–  Interfaces:•  Service,ControlPlaneSouthboundandManagementPlaneSouthboundinterfaces

– ServicesandApplica>ons

5Forwarding Device

Operational Plane

Application PlaneApplication Service

Forwarding Plane

Management Abstraction Layer (CAL)

Service Interface

Network Services Abstraction Layer (NSAL)

Service App App Service

Management PlaneControl Plane

App

Control Abstraction Layer (CAL)

Device and Resource Abstraction Layer (DAL)

CP Southbound Interface

MP Southbound Interface

Page 6: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

Testspre-produc>on•  BeforeapplyinganychangetotheSDN

environment,allplanes,appsandservicesneedtobevalidatedinacontrolledenvironment–  Samesogwareanddevicesusedinproduc>on

needtobeavailablefortests

•  Manytoolsandapproachesavailable,forexample,OFTest,RyuSwitchTest,Cbenchandsomecommercialpossibili>es–  SometestsmightcauseinstabilitytotheSDN

stack(don’ttrythesetestsinproduc>on)

•  Specialaien>onisrequiredfortheControlandDataplanes–  Manypublica>onswithdifferentmethodologiesand

tests6

Forwarding Device

Operational Plane

Application PlaneApplication Service

Forwarding Plane

Management Abstraction

Layer (CAL)

Service Interface

Network Services Abstraction Layer (NSAL)

Service App App Service

Management PlaneControl Plane

App

Control Abstraction Layer

(CAL)

Device and Resource Abstraction Layer (DAL)

CP Southbound

Interface

MP Southbound

Interface

OFTest

Ryu Switch Test,

Cbench, ...

OFTest

Ryu Switch Test, ...

Unittest

...

Page 7: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

Troubleshoo>ngaproduc>onSDNnetwork•  Troubleshoo>ngaproduc>onenvironmenthasdifferentrequirements

–  Itneedstobeagileandleastdisrup>veaspossible–  Itmightneedhistoricalinforma>onandunderstandingoftrafficgoingthroughthenetwork–  Toolshavetobehandy

•  Legacytroubleshoo>ngtoolsarepar>allyusefulorcompletelyuseless–  OAM(Opera>on,Administra>onandMaintenance)isnotsupportedbyOpenFlow(yet)–  Ping,traceroute,SNMP,wireshark/tcpdumparesomehowcompromised

•  Deepknowledgeofthehardwareandsogwareplakormisrequired:–  Usageofthe”hidden”commandsbecomespartofyourrou>ne

•  Sugges>on:geta”premium”supportcontract–  Goingthroughthelevel2TACteamwillincreaseyourstressandthenetworkrecovery>me

7

Page 8: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

SDNTopologies:Star>ngSimple

•  Usually,withjustoneSDNApp,troubleshoo>ngislesscomplex–  OneSDNAppisconnectedthroughanout-of-

bandnetworktomul>pleOFswitches–  Usually,theSDNApphasfullcontrolofports

andVLANs

•  AgoodnetworksnifferandaSyslogserverarethekeytosuccesshere –  HelpsvalidatetheOpenFlowmessagessent

andreceived–  Easesaccesstoerrormessages

8

ApplicationLayer

Forwarding Device

SDN App

OpenFlow 1.x

Forwarding DeviceForwarding Device

Forwarding DeviceUser AUser A User BUser B

Page 9: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

SDNTopologies:AddingComplexity

•  Differentcontrolplanesinparalleltendstobeaconsequenceofslicing–  Moreapplica>onstounderstandandtrack–  Differentlevelsofsogwarestabilityanddebug–  Higherchancesofnetworkoutages

•  Slicing/Par>>oningaddscomplexity:–  OpenFlowcommunica>onbetweenOpenFlow

switchandSDNAppisnotend-to-end:•  OFSwitch->SlicerorSlicer->OFApp

–  ComplexitytotrackwhichswitchistalkingtowhichSDNAppandvice-versa•  OFdoesn’tcarryDPIDoneachOFmessage

•  ”Tradi>onal”sniffersarenotenoughtotrackindirectOpenFlowmessages

9

ApplicationLayer

Forwarding Device

OESS ONOS/SDN-IP

OpenFlow 1.0

Forwarding DeviceForwarding Device

Forwarding Device

FlowSpace Firewall

OpenFlow 1.0

User AUser A User BUser B

Testbed

Page 10: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

ControlPlane:whatshouldbemonitored?

•  EverythingconcernedtotheOpenFlowcommunica>on:–  #offlowsinstalled

•  Avoidgepngclosetothelimitsdocumented(weirdstuffmighthappen)

–  RateofflowMods,PacketOut/PacketIn&Statsrequests/second:•  Switch’sCPUisdirectlyaffectedbytheserates

–  #ofOFP_FLOW_ERRORmessages:•  Somemessagesmightindicatethatacrashisabouttohappen(FULL_TABLE)

–  Flowsdura>on:•  Helpstounderstandtrafficdisrup>onduetoflowsbeingreinstalled

–  FlowandPortCounters(bpsandpps)•  Ifslicingisareality,collectcountersperslice

•  MostoftheSDNappsdon’tprovidesuchdata,someprovidethroughRESTinterfaces 10

Page 11: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

DataPlane:whatshouldbemonitored?DataPlaneMonitoring:•  Insomecases,everythinglooksok,buttrafficisnotflowing

•  Somepossibledataplaneblackholes:–  Aspecificlinecardorinterfacediscardingalltraffic

•  Duetoaninterfacememoryissue,flowsareinstalledbuttrafficisdiscarded

–  InterfacedowninonesidebutupintheremoteandtheSDNAppdoesn’tunderstandthat•  Forinstance:10GLAN-PHY,Ethernetcircuitsand100Glonghaulcircuits•  Inthiscase,dependingoftheside,theSDNAppinstallsthecircuitspoin>ngtotheaffectedlink,discardingalltraffic

–  Aspecificinstalledflowentrycrashed•  Duetoaninterfacememoryissue,onespecificflowiscompromissedandtrafficisdiscarded•  DependingofthenumberofOpenFlowswitchesandflowentries,findingtheproblemmightbeextremely>me-consuming

•  Inthesecases,in-bandtestsarerequired:–  JustaveryfewSDNAppstestin-bandperlink–  NoSDNAppstestin-bandperflow

11

Page 12: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

Disclaimer:

WhatyouareabouttoseeandhearistheAmLight’sexperience.Wearenotsayingthesearethebestorrecommendedmethods–probablyarenot.Don’ttrythemonyournetwork!

12

Page 13: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

ControlPlaneMonitoring•  MonitoringtheOpenFlowmessageswith

passivepacketcapture:–  Non-intrusive–  Almostrisk-free

•  Fewtoolsavailable:–  Wireshark/tshark/tcpdump–  OpenFlowFlightRecorder–  AmLightOpenFlowSniffer

•  AmLightOpenFlowSnifferwascreatedtobeCLI-basedwithsupporttoenvironmentswithslicers:–  Dissects100%ofOpenFlow1.0–  Doesn’trequireGUIorXwindow–  End-to-endcommunica>onvisualiza>on–  Colorstohighlightimportantfields–  Manyfiltersavailabletoop>mizetshoot!–  Source:github.com/jab1982/ofp_sniffer

13

ApplicationLayer

Forwarding Device

OESS ONOS/SDN-IP

OpenFlow 1.0

Forwarding Device

Forwarding Device

Forwarding Device

FlowSpace Firewall

OpenFlow 1.0

User AUser A User BUser B

Testbed

Monitor msgs:OpenFlow Sniffer, OFFR

libpcap

Page 14: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

ControlPlaneMonitoring[2]•  MonitoringAllApplica>onsandCountersina

centralizedNMS:–  ScriptscollectinfofromSDNApps’RESTinterfaces

andexportviaJSON–  ZabbiximportsJSONdataandsaveintoaMySQL

Database–  Currently,collec>ngdatafromOESS,ONOS,FSFWand

switches–  Examples:

14

ApplicationLayer

Forwarding Device

OESS ONOS/SDN-IP

OpenFlow 1.0

Forwarding Device

Forwarding Device

Forwarding Device

FlowSpace Firewall

OpenFlow 1.0

User AUser A User BUser B

Testbed

SNMP, REST, JavaAPI, etc

Monitoring:Zabbix + customized scripts

Page 15: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

DataPlaneMonitoring•  MostoftheSDNAppsuseLLDPorBDDPfor

topologydiscovery–  Oncethetopologyisdiscovered,theseprotocols

arenotusedtomonitorthetopology–  Also,intervalbetweenLLDP/BDDPpacketsisnot

appropriatedforlinkmonitoring

•  Anin-bandtes>ngapproachisneededtovalidatetheDataPlane–  OESSdoesthroughitsForwardingVerifica>on

module–  MostofotherSDNAppsdon’thaveanything

equivalent

•  EventhoughOESS/FVDvalidatesthedatapath,itdoesn’tvaliteusers’flows–  Afullportissueisdetected,butasingleflowissue

isnot

15

ApplicationLayer

Forwarding Device

OESS ONOS/SDN-IP

OpenFlow 1.0

Forwarding Device

Forwarding Device

Forwarding Device

FlowSpace Firewall

OpenFlow 1.0

User AUser A User BUser B

Testbed

Monitoring Data plane: Trunk ports: OESS FWD

Page 16: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

DataPlaneMonitoring[2]•  Monitoringindividualflowsisimportantbutextremelycomplex–  Beingproac>vewithallflowsisdesiredbuttheintervalbetweentestsandnumberofflowsneededtobetakenintoconsidera>on

–  Usingareac>veapproachisthebestsugges>on•  Userswon’tbehappy,butyourswitcheswon’tcrash

•  Approachestotestusers’flowsareyetconsideredexperimental–  ASDNTraceprotocolwasproposed:–  hip://sdntrace-protocol.readthedocs.io/en/latest/ 16

ApplicationLayer

Forwarding Device

OESS ONOS/SDN-IP

OpenFlow 1.0

Forwarding Device

Forwarding Device

Forwarding Device

FlowSpace Firewall

OpenFlow 1.0

User AUser A User BUser B

Testbed

Monitoring User Flows: SDNTrace

Page 17: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

DataPlaneMonitoring[3]•  AmLight'sdevelopeditsownSDNTracetotest

users’flowswithoutchangingthem–  WorksthroughGUIorREST–  Verylightweight–  Very“cheap”,onlytwo-fourflowentriesneeded–  TracesL2andL3flows–  Yetunderevalua>onatAmLight–  Developedincollabora>onwiththeAcademic

NetworkofSaoPaulo/Brazil

•  Tracingacircuitisdoneinsecondsinstead

ofmanyminutesandcanworkwithbothZabbixandNagios

Github:github.com/amlight/SDNTrace

17

Page 18: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

Future•  Newtools/scripts/protocolsares>llneeded

–  S>llalongandpainfuljourneyahead–  OpenFlow-OAM?

•  ImprovementstoOpenFlowagentsarebeingconstantlyreleased–  ButnewbugsarecomingwiththemL

•  SomeSDNmonitoring-onlyapplica>onsarebeingproposedanddeveloped–  AmLightisdevelopingitsownSDNLookingGlasstoconsolidateallpassiveandac>vemonitoringac>vi>esassociatedtotheSDNenvironment(tobereleasedbyJanuary)

–  Butsideapplica>onsarenotideal:itisimportantthatallSDNApplica>onsincorporatetroubleshoo>ngcapabili>esintheircore!

18

Page 19: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

Off-topic:Sugges>onstoNetworkEngineers•  Whatis/willbeourposi>ondescrip>on?

–  NetworkEngineers?SDNEngineers?ResearchNetworkEngineers?–  MaybeNetworkEngineers2.0?–  Itdoesn’tmaierthedescrip>on,itmaiersthatwehavetoevolve!

•  WithSDN,troubleshoo>ngisverydifferent:insteadofusingCLIandsniffers,weneedtoreadcodeandapplica>on’slogs

•  Mostofushatesogwaredevelopment,butitis>metochangeourmentality–  AtAmLight,Idon’trememberlast>meIcreatedaVLANusingaCLI

•  IfSDNbecomesthenextde-factostandard,itwillhappeninafewyears–  Wealls>llhave>metolearnandgetpreparedforthisnewreality

•  Recommenda>ons:–  LearnPythonorJava(JavaScriptisaplus)

•  Ryuisaveryinteres>ngOpenFlowcontrollertostartwith–  JoinRyuorONOSmailinglists–  Mininetisyourfriend!

19

Page 20: Troubleshoo>ng AmLight - Internet2 · Troubleshoo>ng AmLight: Handling Network Events in a Produc>on SDN Environment ... – ONOS/SDN-IP for Academic IPv4 – Currently 5 slices for

JeronimoBezerraFloridaInterna>onalUniversity

<[email protected]>

Internet2TechnologyExchangeSep26th

Troubleshoo>ngAmLight:HandlingNetworkEventsinaProduc>onSDNEnvironment

Ques8ons???