17830439 Ethernet and MPLS WP

download 17830439 Ethernet and MPLS WP

of 7

Transcript of 17830439 Ethernet and MPLS WP

  • 7/28/2019 17830439 Ethernet and MPLS WP

    1/7

    Overview

    This paper describes the Ethernet and Multi-Protocol LabelSwitching (MPLS) tools and procedures used to accomplishOperations, Administration, and Maintenance (OAM). Thisfunctionality addresses the fault management aspects of theFault, Configuration, Accounting, Performance, Security(FCAPS) model as defined by the ITU-T Telecommunication

    Management Network (TMN), as shown in Figure 1.

    Recent enhancements to Ethernet and MPLS have addedcarrier-class OAM features for monitoring, detecting,verifying, isolating, and repairing faults, with appropriatenotifications to network administrators. Theseenhancements enable network operators to deploytimesaving, automated, self-healing practices, as well ason-demand diagnostics and troubleshooting techniques.The purpose of OAM is to improve revenue growth andprofitability for service providers, as outlined in Figure 2.

    This white paper describes the OAM features in thecontext of the objectives above, and the unique benefitsof Cienas solution.

    OAM Process Flow

    Figure 3 describes the serviceprovider process flow when faultsappear in the network, startingwith the fault and ending afterverification of the repair. Eachstep must be optimized toprotect both the serviceprovider and the subscriber.

    Fault Detection

    Fault detection includes mechanisms to detect faults atthe device control plane or data plane level. Faults mustbe detected quickly enough to minimize Time to Recover(TTR). However, detection should be based on anobservation window large enough to avoid false faultdetections. For example, a control plane can become non-responsive for a few microseconds while handling a burstof interrupts. As long as the control plane is restored to anormal state within an acceptable time window, the networkelement does not experience a software failure. OAMhandles a wide range of failure scenarios that vary in natureand location, from a software defect to a backhoe tearingapart a fiber conduit by mistake.

    There are three major categories of failure:

    > Link failure

    > Service transport failure

    > SLA failure

    Ethernet and MPLS OAMOperations, Administration and Maintenance

    W A S D P White Paper

    B M L

    S M L

    F a u l t M a n a g e m

    e n t

    C o n f i g

    u r a t i o n M a n a g e m e n t

    A c c o u n t i n g M

    a n a g e m e n t

    P e r f o r m a n c e M

    a n a g e m e n t

    S e c u r i t y M a n a g e m e n t

    N M L

    E M L

    N E L

    O AM

    F CA P S

    T M N

    NEL: Network Element Layer (devices)EML: Element Management Layer (device-level functions)NML: Network Management Layer (topology management)SML: Service Management Layer (Service Level Agreements (SLAs))BML: Business Management Layer (budgeting and b illing)

    Legend :

    Figure 1. FCAPS model

    Objectives

    Protecting revenue by preventing service outages and offeringfaster service restoration

    Maximizing revenue growth by enabling richer service offerings

    Reducing operational costs by cutting repair costs and operational overhead

    Figure 2. OAM objectives

    Fault

    Fault Detection

    Fault Notification

    Fault Verification

    Fault Isolation

    Repair

    Repair Verification

    Figure 3. OAM process flow

  • 7/28/2019 17830439 Ethernet and MPLS WP

    2/7

    2

    Link Failure

    Link failure represents either the complete failure of a link orthe performance of a link degrading below an acceptablelevel. The causes may include an optical transceiver failure ateither end of the link, dust or other impurities in the

    connector, a fiber cut between the elements, or elementfailure at the other end of the link.

    Service Transport Failure

    Ethernet services can be transported natively, using VirtualLocal Area Networks (VLANs) (IEEE 802.1Q) or stacked VLANs (802.1ad), or MPLS tunnels and MPLS VirtualCircuits (VCs). Each of these transport mechanismscan fail due to software failure, memory corruption,or simple misconfiguration.

    Service Level Agreement Failure

    The SLA describes the characteristics of the servicesprovided by carriers to their subscribers. Adherence tothe SLA can be measured using one or more of thefollowing metrics:

    > Frame Delay: delay experienced by the traffic carriedby the service

    > Frame Delay Variation: variation in that delay

    > Frame Loss: percentage of frames passed throughthe service that were dropped by the network

    > Service Availability: percentage of time when theservice is available to the subscriber

    Monitoring these SLA parameters provides indications of fault or performance issues. The Metro Ethernet Forum(MEF) and the ITU-T are defining standards for performancemanagement of Ethernet services. This white paper focuseson the fault management aspect of SLA failures. SLA failurescan be caused by link failures, such as a failing opticaltransceiver resulting in partial packet loss, or a servicetransport failure, such as a software failure leading toincorrect forwarding tables.

    Fault Notification

    Once detected by the network element layer, the faultneeds to be conveyed to the entities that will work towardrepairing the fault. Such entities can require either human or

    automated servicingsuch as the manual replacement of afaulty transceiver, or a Rapid Spanning Tree Protocol (RSTP)reconvergence after a link failure, respectively. In any case,fault notification should be:

    > Responsive: the time saved will protect revenue andmay avoid penalties.

    > Meaningful: a mere link down Simple NetworkManagement Protocol (SNMP) trap sent when anoptical transceiver fails is insufficient. A trap containinginformation regarding the faulty transceiver and thereason for the failure reduces troubleshooting cost.

    Ethernet and MPLS OAM

    Cienas Carrier Ethernet Service Delivery(CESD) switches are optimized to enablenetwork reconvergence below 50 ms. Theseenhancements allow Ethernet service-deliverynetworks based on Ciena products to supportcritical, time-sensitive applications with thesame SLAs and guarantees of SONET/SDHoptical rings. This level of performance isachieved, in part, by providing high-priority,interrupt-based failure detection, shieldingservices from link-level failures.

    Cienas True Carrier Ethernet TM offeringsare the only access/metro edge solutionsthat enable service providers to deployany mix of Ethernet and MPLS-basedservice transports over a commoninfrastructure. This allows service providersto migrate easily from Ethernet to MPLSaccess deployments and extend the servicesand capabilities of an MPLS core network

    directly to subscribers, with no additionalcapital investment required.

    Ciena, through the early adoption of IEEE802.1ag Connectivity Fault Management(CFM) provides VLAN-based service transportOAM. The combination of Label SwitchedPath (LSP) ping, LSP traceroute, VirtualCircuit Connection Verification (VCCV), Bi-directional Forwarding Detection (BFD) andFast ReRoute (FRR) provides comprehensiveMPLS-based service transport OAM.

    Cienas CESD switches offer intelligentclassification and queue servicing, whichminimizes frame delay and variation. In

    addition, Ciena provides a unique set of self-healing techniques at the link and servicetransport layers, to minimize SLA failuresrelating to frame loss and service availability.

  • 7/28/2019 17830439 Ethernet and MPLS WP

    3/7

    3

    > Concise: sending multiple traps with redundant failureinformation will obfuscate the real cause of the failureand slow down the fault isolation step.

    Fault Verification

    After notification, the Network Operation Center (NOC)engineer should verify the fault, and determine whether thecondition persists. By the time the link fail indication isreceived, the Ethernet network will have reconverged.

    Under most conditions, failover and restoration with CienasCarrier Ethernet Service Delivery devices takes less than 50ms. Fault verification using on-demand OAM techniqueseliminates false failure indications. Not verifying the validityof the fault could lead the network operator to try to isolatea failure that does not exist.

    Fault Isolation

    Fault isolation consists of determining the exact source,location, and nature of the fault, including the specificnetwork element(s) and network layer(s) experiencing thefault. A failure at a low level may impact higher levels andlead to additional failures. For example, a link failure canlead to broken MPLS tunnel connectivity, also impacting allof the MPLS VCs that tunnel carries.

    Notification of a low-level failure can be followed orsurrounded by higher-level failure notifications. This processmakes fault isolation more difficult, time-consuming, and

    costly. Features such as alarm correlation help minimize thecost of isolating a fault by decreasing the number of faultnotification messages.

    Repair

    Depending on the efficiency of the OAM process, repairand preventative maintenance can occur at different stages:

    >

    After the fault impacts the service. Time-to-repair ismost critical, as the network operator needs to remedythe problem quickly to restore the service. Cienas TrueCarrier Ethernet solutions provide modularity in thenetwork elements, enabling the network operator tochange only the failed element, saving time andeliminating impacts to other services. For example,risk of error is eliminated because the failure of a hot-swappable transceiver does not require the replacementand re-cabling of the entire network element.

    > Before the fault impacts the service. Redundancyenables proactive maintenance, significantly reducing

    service outage times. Cienas modular solution, coupledwith redundant links, control modules, power supplies,and fans, allows non-invasive repair of networkcomponents, protecting the services the componentscarry. For example, the failure of a redundant controlmodule will lead only to non-invasive switchover to thestandby module.

    > Before the fault leads to an element or networkfailure, such as a performance degradation scenario.By continuously monitoring key metrics relating toelement and network health, service providers canschedule maintenance preemptively, thereby using

    fewer resources.

    Repair Verification

    After a remedy is enacted, the same on-demand OAM mechanisms used during faultverification confirm that the fault no longerexists. An IP ping can be used both to verify IPconnectivity faults on the control-plane andrestore connectivity.

    Ethernet and MPLS OAM

    Ciena provides a comprehensive solutionfor optimum fault notification, includinghigh-priority generation of SNMP trapswith a content focused on failure source. Inaddition, Cienas Ethernet Services Manager(ESM) solution offers alarm correlationcapabilities enabling network operators toassociate alarms to more quickly isolate thecause of the fault.

    Ciena offers a complete on-demand OAMsolution, enabling the network operator toconduct layer-by-layer fault isolation (link,service transport, and SLA layers). Figure 4shows the extent of the various OAMmechanisms useful for isolating faults.

    Ethernet

    Service Agreement Layer

    Service Transport LayerService Transport Layer

    Link Layer

    MPLS MPLS

    Link Layer

    Figure 4. Major network fault categories

  • 7/28/2019 17830439 Ethernet and MPLS WP

    4/7

    4

    OAM ProtocolsWith the addition of comprehensive OAM

    capabilities, Ethernet and MPLS offer acomplete feature set that allows carriers tomaximize Ethernet-based service revenue. IEEE,IETF, ITU-T, and MEF now describe mechanismsthat report the status of a given end-to-endservice, representing a subscriber-centric viewof the network, and provide link connectivityinformation, representing a provider-centricview of the network. Figure 5 offers a high-levelview of these mechanisms against the OAMprocess flow and different failure categories.

    IEEE 802.3ah Ethernet First Mile (EFM) OAM

    EFM OAM, described in Figure 6, provides link-layermechanisms that complement applications that may residein higher layers (such as IEEE 802.1ag or MEF Service OAM).EFM OAM, also called link OAM, encompasses a simpleprotocol that operates across a single link.

    Thresholds are configured to monitor signal degradation,such as frame errors. Messages are passed across the link tocommunicate statistics regarding link health. When a failinglink is detected, SNMP communicates this to management

    stations. In addition, the link may be taken out of service andplaced in remote loopback mode for fault isolation. Prior toplacing a link in service, EFM OAM may be used to test theperformance of the link. Once verified to be operational anderror-free, the link is taken out of remote loopback andplaced in service. Standby links may be testedcontinuously prior to being activated by protocolssuch as IEEE 802.1w RSTP or IEEE 802.1aq ShortestPath Bridging.

    IEEE 802.1ag Connectivity Fault Management

    Building upon IEEE 802.3ah EFM OAM, IEEE 802.1agCFM specifies capabilities for detecting, isolating,and reporting connectivity faults for VLAN-based

    service transport networks. CFM, operates at both thephysical and logical levels, monitoring and troubleshootingfaults. For instance, CFM can monitor physical linksbetween adjacent or distant devices. In addition, faultmonitoring between two end-points can be configuredbased on a logical network layer (such as per-VLAN). KeyCFM features are shown in Figure 7.

    The CFM protocol, often called Ethernet OAM, sends heart-beat style Continuity Check Messages (CCMs). Failure toreceive these messages, in order, in a certain amount of timeindicates one or more possible network errors, includingpath or device failure or network configuration problems.Management stations monitor the status of the receptionof CCMs and take appropriate action.

    Ethernet and MPLS OAM

    IEEE 802.3ah EFM OAM

    Features Benefits

    Auto-discovery Eliminates the need for operator configurationUni-directional Fault Signaling Enables the detection of a one-way link failureRemote Loopback Provides on-demand link diagnostics, including

    bit-error rate approximationLink Monitoring Offers proactive, traffic-based threshold link

    monitoringCritical Events Supports communication of network element

    conditions that may cause link failure, includingpower and temperature

    Layer 2 Variable Retrieval Allows supplemental link statistics collection,augmenting SNMP

    Organization-specificExtensions Enables standards development organizations andvendors to expand scope

    Figure 5. OAM protocols matrix

    Figure 6. IEEE 802.3ah EFM OAM

    ServiceLevel

    Agreement

    Link

    Fault

    ServiceTransport

    RepairFaultDetectionFault

    NotificationFault

    VerificationFault

    IsolationFault

    Verification

    MEF Service OAM MEF

    VCCV/BFD MSTP LSP Ping

    IEEE 802.1ag CFM/ITU-T Y.1731 MSTP802.1ag Y.1731

    IEEE 802.3ah EFM OAM MSTPSNMP 802.1ag Y.1731802.3ah

    EFM OAM

    IEEE 802.1ag CFMFeatures BenefitsContinuity Check Continuously verifies VLAN connectivity and

    may indicate network faults or misconfigurationsLoopback Request Offers on-demand or proactive indication of (MAC ping) VLAN control-plane responsivenessLinktrace Request Provides on-demand or proactive VLAN(MAC traceroute) topology information

    Figure 7. IEEE 802.1ag CFM

  • 7/28/2019 17830439 Ethernet and MPLS WP

    5/7

    Troubleshooting tools are provided in the form of Media

    Access Control (MAC) ping (formally known as IEEE 802.1agLoopback Request) and MAC traceroute (formally known asIEEE 802.1ag Linktrace Request). Network operators mayinitiate these features, or the features may run automaticallyas monitoring functions in background processes.

    Since CFM is being developed after completion of the IEEE 802.1ad Provider Bridges protocol, a secondimportant aspect of the project allows multiple nestedMaintenance Domains (MDs) to coexist on the samephysical network, each potentially managed by adifferent administrative organization (service provideror network operator).

    ITU-T Y.1731

    ITU-T Study Group 13 developed Y.1731 in cooperationwith IEEE 802.1ag CFM, further defining VLAN-based servicetransport OAM functionality. Several additional features offerperformance monitoring capabilities. ITU-T Y.1731 and CFMuse an identical frame format and share the same operationcode (OpCode) space. As a result, these complementaryprotocols are simpler to deploy in a service providersnetwork. Figure 8 provides a summary of the featurescontained in Y.1731.

    VLAN-based service transport networksconfigure certain network elements atMaintenance End Points (MEPs). These MEPssit at the boundaries of Ethernet domains.Figure 9 shows the span of the different OAMmechanisms offered by Y.1731

    MPLS

    MPLS deployed to the customer premisesfacilitates the interconnection of the accessinfrastructure with the existing MPLS corenetwork, while increasing the need for MPLS-specific OAM tools. Further description of MPLS shown in Figures 10 and 11.

    LSP Ping

    LSP ping is an in-band, on-demand mechanism toverify the status of an MPLS tunnel. An LSP can failbecause of misconfigurations such as disabled MPLS,mismatched labels, or routing into the wrong tunnel,or broken Label Distribution Protocol (LDP)adjacencies, corruption of Forwarding InformationBases (FIB), or other software/ hardware failures. LSPping sends an echo request to a target Label SwitchRouter (LSR) using MPLS addressing. To prevent theIP packet from being routed to its destination, thedestination IP address of the echo request packet isdefined as 127.0.0.0/8. If reached, the destination LSR

    sends an echo reply back to the originator of theMPLS echo request.

    ITU-T Y.1731

    Features Benefits

    Alarm Indication Signal Provides fault notification for devices not participating in the(ETH-AIS) VLAN-based Ethernet Continuity Check

    Remote Defect Indication Offers fault indication of the other end of a VLAN-based(ETH-RDI) Ethernet serviceLocked Signal (ETH-LCK) Enables maintenance actions while differentiating and

    isolating actual fault conditionsTest Signal (ETH-Test) Allows a one-way, on-demand, in-service or out-of-service

    VLAN test, such as throughput or frame lossPerformance Monitoring Monitors traffic performance on a point-to-point, end-to-end,(ETH-PM) VLAN-based Ethernet serviceFrame Loss Measurement Collects end-to-end frame loss information to approximate severely(ETH-LM) errored seconds, which indicate VLAN-based service transport availabilityFrame Delay Measurement Provides an on-demand Frame Delay and Frame Delay Variation(ETH-DM) measurement between two points of the VLAN-based service

    5

    Ethernet and MPLS OAM

    Figure 8. ITU-T Y.1731

    Ethernet

    ETH-PM

    ETH-AIS, ETH-RDI, ETH-LCK ETH-Test, ETH-LM, ETH-DM

    UNI

    CE MEP

    UNI

    CEMEPMEP

    Ethernet

    Figure 9. ITU-T Y.1731 architecture

    Ciena offers a solution allowingtransport of Ethernet services,either natively or using MPLSencapsulation.

    MPLS

    Features BenefitsLabel Switched Path Ping Offers on-demand connectivity information about

    MPLS tunnels

    LSP Traceroute Provides MPLS switching and MaximumTransmission Unit (MTU) configuration information

    Virtual Circuit Connection Enables proactive connectivity monitoring of Verification MPLS pseudowires

    Bi-directional Forwarding Allows scalable, proactive data-plane verificationof MPLS LSPs

    Fast ReRoute Provides automated repair of MPLS failures

    Figure 10. MPLS OAM

  • 7/28/2019 17830439 Ethernet and MPLS WP

    6/7

    6

    LSP Traceroute

    LSP traceroute determines the hop-by-hop path anddestination of an LSP. Like LSP ping, traceroute is an in-band,on-demand MPLS OAM utility that uses an MPLS echorequest/reply mechanism to detect MTU misconfigurationbetween LSRs. However, with LSP traceroute, all LSRs alongthe pathup to and including the destination LSRreply tothe echo request. This technique allows the operator toidentify and distinguish LSRs along a path.

    Virtual Circuit Connection Verification

    Using LSP ping, a service provider can monitor the status of an MPLS tunnel. To diagnose a problem within the tunnel,the service provider needs a mechanism to verify theconnectivity of the pseudowires (VCs). VCCV allowsproactive monitoring of pseudowires within MPLS tunnels byestablishing a control channel associated with eachpseudowire.

    Bi-directional Forwarding Detection

    VCCV requires involvement of the MPLS control-plane; asthe number of VCs increase, so will the load on the control-plane. BFD allows systematic and more scalable detection of

    MPLS LSP data plane failures, with less involvement from thecontrol plane. As a result, BFD allows faster detection ona larger number of LSPs. BFD relies on a hello packetexchanged by neighbors at negotiated, regular intervals.When a hello packet is not received as expected, theneighbor is declared down.

    Fast ReRoute

    Fast ReRoute allows automated repair of LSP tunnelsto reduce packet loss on LSPs. If there is a link or nodefailure, an LSP employing Fast ReRoute can redirect MPLStraffic to previously computed and established alternate

    paths around the failed link or node. The alternate paths areselected during the establishment of a primary LSP underhop-by-hop control. With Fast ReRoute enabled, ResourceReSerVation Protocol-Traffic Extension (RSVP-TE) establisheslocal alternate LSPs for each potential point of failure alongthe primary path.

    MEF Service OAM

    The MEF is pursing a complementary set of OAM-relatedfunctions operating at the SLA layer. The Phase 1specification will contain performance monitoringcapabilities for point-to-point services reflecting the frameloss ratio, frame delay (latency), and frame delay variation(jitter) characteristics of the service, as shown in Figure 12.

    In addition, per-service fault management will be supportedfor point-to-point, point-to-multipoint, and multi-pointservices. Fault detection encompasses loss of continuitybetween management end-points and detection of potentialfor loops in the service. This fault detection/ verificationcapability is supported proactively or on demand throughoperator action. MEF Service OAM, often called ServiceOAM, also provides fault isolation and fault notification.

    IPEthernet services offer the benefit of low deployment costsby not requiring IP provisioning of each individual dataplane element. However, the control plane uses mostly IP-based protocols, such as Telnet, SNMP, or IGMP. In thatregard, control plane failures must be detected at the IPlevel. Two mechanisms have been in use since the adventof IP networking: IP ping, which provides on-demandconnectivity verification of the IP control-plane, and IPtraceroute, which offers routing and delay information for

    an IP destination.

    Ethernet and MPLS OAM

    M P L S T u n n e l

    M P L S T u n n e l

    VC B

    VC A

    Figure 11. Basic MPLS constructs

    MEF Service OAM

    Features Benefits

    Point-to-point Ethernet VirtualCircuit Performance Monitoring

    Point-to-multipoint EVC PM Provides SLA assurance for different services

    Multipoint-to-multipoint EVC PM

    EVC Fault Management Enables identification and isolation of faultat the SLA layer

    Figure 12. MEF Service OAM

  • 7/28/2019 17830439 Ethernet and MPLS WP

    7/7

    IP Ping

    IP ping is a basic mechanism that verifies IP connectivitythrough the network. It verifies that a given IP addressexists, is reachable, and can accept ping requests, andcalculates the latency between the control planes of two

    IP network elements.

    IP Traceroute

    IP traceroute is another OAM tool that records and displaysthe IP message route between two IP elements. It alsocalculates the latency between the control-planes of eachIP element of the route.

    Conclusion

    Ethernet and MPLS OAM

    1201 Winterson RoadLinthicum, MD 210901.800.207.3714 (US and Canada)

    1.410.865.8671 (outside US)+44.20.7012.5555 (international)www.ciena.com

    Specialising in transition to

    service-driven networks to help youchange the way you compete.

    Ciena may from time to time make changes to the products or specifications contained herein without notice. 2009 Ciena Corporation. All rights reserved. WP062A4 2.2009

    Cienas Carrier Ethernet Service Deliverysolution, described in Figure 13, enablesservice providers to operate, administrate,and maintain any mix of Ethernet andMPLS-based L2 VPNs effectively. Byleveraging this unique OAM capability,service providers can protect currentrevenue and maximize revenue growth,while reducing operational costs.

    Objectives Carrier Ethernet Service Delivery Solution

    Protectsrevenue by:

    Maximizesrevenuegrowth by:

    Reducesoperationalcosts by:

    Preventing serviceoutages:

    Offering fasterservice restora tion:

    Enabling richerservice offerings:

    Reducing repair

    costs:

    Reducingoperationaloverhead:

    >

    Sub-50 ms automated network reconvergence> Robust Quality of Service (QoS) architecture minimizes SLA failures> Modular architecture enables planned non-invasive repairs> Redundancy for mission-critical network components

    > Generates precise failure information more quickly> Service-aware OAM feature set intelligently traverses each

    layer as needed> Complete OAM feature set covers each network layer (link,

    service transport and SLA)

    > Comprehensive Ethernet and MPLS OAM feature setsIntelligent classification

    > Advanced alarm correlation simplifies fault isolation> Hot-swappable solution enables shorter and less expensive repairs

    > On-demand OAM techniques eliminate unnecessary investigationof false failure indications

    > Modular solution reduces cost of spares> Proactive monitoring enables cost-effective

    preemptive maintenance

    Figure 13. Cienas Carrier Ethernet Service Delivery solution