17830439 Ethernet and MPLS WP

7/28/2019 17830439 Ethernet and MPLS WP

1/7

Overview

This paper describes the Ethernet and Multi-Protocol LabelSwitching (MPLS) tools and procedures used to accomplishOperations, Administration, and Maintenance (OAM). Thisfunctionality addresses the fault management aspects of theFault, Configuration, Accounting, Performance, Security(FCAPS) model as defined by the ITU-T Telecommunication

Management Network (TMN), as shown in Figure 1.

Recent enhancements to Ethernet and MPLS have addedcarrier-class OAM features for monitoring, detecting,verifying, isolating, and repairing faults, with appropriatenotifications to network administrators. Theseenhancements enable network operators to deploytimesaving, automated, self-healing practices, as well ason-demand diagnostics and troubleshooting techniques.The purpose of OAM is to improve revenue growth andprofitability for service providers, as outlined in Figure 2.

This white paper describes the OAM features in thecontext of the objectives above, and the unique benefitsof Cienas solution.

OAM Process Flow

Figure 3 describes the serviceprovider process flow when faultsappear in the network, startingwith the fault and ending afterverification of the repair. Eachstep must be optimized toprotect both the serviceprovider and the subscriber.

Fault Detection

Fault detection includes mechanisms to detect faults atthe device control plane or data plane level. Faults mustbe detected quickly enough to minimize Time to Recover(TTR). However, detection should be based on anobservation window large enough to avoid false faultdetections. For example, a control plane can become non-responsive for a few microseconds while handling a burstof interrupts. As long as the control plane is restored to anormal state within an acceptable time window, the networkelement does not experience a software failure. OAMhandles a wide range of failure scenarios that vary in natureand location, from a software defect to a backhoe tearingapart a fiber conduit by mistake.

There are three major categories of failure:

> Link failure

> Service transport failure

> SLA failure

Ethernet and MPLS OAMOperations, Administration and Maintenance

W A S D P White Paper

B M L

S M L

F a u l t M a n a g e m

e n t

C o n f i g

u r a t i o n M a n a g e m e n t

A c c o u n t i n g M

a n a g e m e n t

P e r f o r m a n c e M

a n a g e m e n t

S e c u r i t y M a n a g e m e n t

N M L

E M L

N E L

O AM

F CA P S

T M N

NEL: Network Element Layer (devices)EML: Element Management Layer (device-level functions)NML: Network Management Layer (topology management)SML: Service Management Layer (Service Level Agreements (SLAs))BML: Business Management Layer (budgeting and b illing)

Legend :

Figure 1. FCAPS model

Objectives

Protecting revenue by preventing service outages and offeringfaster service restoration

Maximizing revenue growth by enabling richer service offerings

Reducing operational costs by cutting repair costs and operational overhead

Figure 2. OAM objectives

Fault

Fault Detection

Fault Notification

Fault Verification

Fault Isolation

Repair

Repair Verification

Figure 3. OAM process flow


2/7

2

Link Failure

Link failure represents either the complete failure of a link orthe performance of a link degrading below an acceptablelevel. The causes may include an optical transceiver failure ateither end of the link, dust or other impurities in the

connector, a fiber cut between the elements, or elementfailure at the other end of the link.

Service Transport Failure

Ethernet services can be transported natively, using VirtualLocal Area Networks (VLANs) (IEEE 802.1Q) or stacked VLANs (802.1ad), or MPLS tunnels and MPLS VirtualCircuits (VCs). Each of these transport mechanismscan fail due to software failure, memory corruption,or simple misconfiguration.

Service Level Agreement Failure

The SLA describes the characteristics of the servicesprovided by carriers to their subscribers. Adherence tothe SLA can be measured using one or more of thefollowing metrics:

> Frame Delay: delay experienced by the traffic carriedby the service

> Frame Delay Variation: variation in that delay

> Frame Loss: percentage of frames passed throughthe service that were dropped by the network

> Service Availability: percentage of time when theservice is available to the subscriber

Monitoring these SLA parameters provides indications of fault or performance issues. The Metro Ethernet Forum(MEF) and the ITU-T are defining standards for performancemanagement of Ethernet services. This white paper focuseson the fault management aspect of SLA failures. SLA failurescan be caused by link failures, such as a failing opticaltransceiver resulting in partial packet loss, or a servicetransport failure, such as a software failure leading toincorrect forwarding tables.

Fault Notification

Once detected by the network element layer, the faultneeds to be conveyed to the entities that will work towardrepairing the fault. Such entities can require either human or

automated servicingsuch as the manual replacement of afaulty transceiver, or a Rapid Spanning Tree Protocol (RSTP)reconvergence after a link failure, respectively. In any case,fault notification should be:

> Responsive: the time saved will protect revenue andmay avoid penalties.

> Meaningful: a mere link down Simple NetworkManagement Protocol (SNMP) trap sent when anoptical transceiver fails is insufficient. A trap containinginformation regarding the faulty transceiver and thereason for the failure reduces troubleshooting cost.

Ethernet and MPLS OAM

Cienas Carrier Ethernet Service Delivery(CESD) switches are optimized to enablenetwork reconvergence below 50 ms. Theseenhancements allow Ethernet service-deliverynetworks based on Ciena products to supportcritical, time-sensitive applications with thesame SLAs and guarantees of SONET/SDHoptical rings. This level of performance isachieved, in part, by providing high-priority,interrupt-based failure detection, shieldingservices from link-level failures.

Cienas True Carrier Ethernet TM offeringsare the only access/metro edge solutionsthat enable service providers to deployany mix of Ethernet and MPLS-basedservice transports over a commoninfrastructure. This allows service providersto migrate easily from Ethernet to MPLSaccess deployments and extend the servicesand capabilities of an MPLS core network

directly to subscribers, with no additionalcapital investment required.

Ciena, through the early adoption of IEEE802.1ag Connectivity Fault Management(CFM) provides VLAN-based service transportOAM. The combination of Label SwitchedPath (LSP) ping, LSP traceroute, VirtualCircuit Connection Verification (VCCV), Bi-directional Forwarding Detection (BFD) andFast ReRoute (FRR) provides comprehensiveMPLS-based service transport OAM.

Cienas CESD switches offer intelligentclassification and queue servicing, whichminimizes frame delay and variation. In

addition, Ciena provides a unique set of self-healing techniques at the link and servicetransport layers, to minimize SLA failuresrelating to frame loss and service availability.


3/7

3

> Concise: sending multiple traps with redundant failureinformation will obfuscate the real cause of the failureand slow down the fault isolation step.

Fault Verification

After notification, the Network Operation Center (NOC)engineer should verify the fault, and determine whether thecondition persists. By the time the link fail indication isreceived, the Ethernet network will have reconverged.

Under most conditions, failover and restoration with CienasCarrier Ethernet Service Delivery devices takes less than 50ms. Fault verification using on-demand OAM techniqueseliminates false failure indications. Not verifying the validityof the fault could lead the network operator to try to isolatea failure that does not exist.

Fault Isolation

Fault isolation consists of determining the exact source,location, and nature of the fault, including the specificnetwork element(s) and network layer(s) experiencing thefault. A failure at a low level may impact higher levels andlead to additional failures. For example, a link failure canlead to broken MPLS tunnel connectivity, also impacting allof the MPLS VCs that tunnel carries.

Notification of a low-level failure can be followed orsurrounded by higher-level failure notifications. This processmakes fault isolation more difficult, time-consuming, and

costly. Features such as alarm correlation help minimize thecost of isolating a fault by decreasing the number of faultnotification messages.

Repair

Depending on the efficiency of the OAM process, repairand preventative maintenance can occur at different stages:

>

After the fault impacts the service. Time-to-repair ismost critical, as the network operator needs to remedythe problem quickly to restore the service. Cienas TrueCarrier Ethernet solutions provide modularity in thenetwork elements, enabling the network operator tochange only the failed element, saving time andeliminating impacts to other services. For example,risk of error is eliminated because the failure of a hot-swappable transceiver does not require the replacementand re-cabling of the entire network element.

> Before the fault impacts the service. Redundancyenables proactive maintenance, significantly reducing

service outage times. Cienas modular solution, coupledwith redundant links, control modules, power supplies,and fans, allows non-invasive repair of networkcomponents, protecting the services the componentscarry. For example, the failure of a redundant controlmodule will lead only to non-invasive switchover to thestandby module.

> Before the fault leads to an element or networkfailure, such as a performance degradation scenario.By continuously monitoring key metrics relating toelement and network health, service providers canschedule maintenance preemptively, thereby using

fewer resources.

Repair Verification

After a remedy is enacted, the same on-demand OAM mechanisms used during faultverification confirm that the fault no longerexists. An IP ping can be used both to verify IPconnectivity faults on the control-plane andrestore connectivity.


Ciena provides a comprehensive solutionfor optimum fault notification, includinghigh-priority generation of SNMP trapswith a content focused on failure source. Inaddition, Cienas Ethernet Services Manager(ESM) solution offers alarm correlationcapabilities enabling network operators toassociate alarms to more quickly isolate thecause of the fault.

Ciena offers a complete on-demand OAMsolution, enabling the network operator toconduct layer-by-layer fault isolation (link,service transport, and SLA layers). Figure 4shows the extent of the various OAMmechanisms useful for isolating faults.

Ethernet

Service Agreement Layer

Service Transport LayerService Transport Layer

Link Layer

MPLS MPLS

Link Layer

Figure 4. Major network fault categories


4/7

4

OAM ProtocolsWith the addition of comprehensive OAM

capabilities, Ethernet and MPLS offer acomplete feature set that allows carriers tomaximize Ethernet-based service revenue. IEEE,IETF, ITU-T, and MEF now describe mechanismsthat report the status of a given end-to-endservice, representing a subscriber-centric viewof the network, and provide link connectivityinformation, representing a provider-centricview of the network. Figure 5 offers a high-levelview of these mechanisms against the OAMprocess flow and different failure categories.

IEEE 802.3ah Ethernet First Mile (EFM) OAM

EFM OAM, described in Figure 6, provides link-layermechanisms that complement applications that may residein higher layers (such as IEEE 802.1ag or MEF Service OAM).EFM OAM, also called link OAM, encompasses a simpleprotocol that operates across a single link.

Thresholds are configured to monitor signal degradation,such as frame errors. Messages are passed across the link tocommunicate statistics regarding link health. When a failinglink is detected, SNMP communicates this to management

stations. In addition, the link may be taken out of service andplaced in remote loopback mode for fault isolation. Prior toplacing a link in service, EFM OAM may be used to test theperformance of the link. Once verified to be operational anderror-free, the link is taken out of remote loopback andplaced in service. Standby links may be testedcontinuously prior to being activated by protocolssuch as IEEE 802.1w RSTP or IEEE 802.1aq ShortestPath Bridging.

IEEE 802.1ag Connectivity Fault Management

Building upon IEEE 802.3ah EFM OAM, IEEE 802.1agCFM specifies capabilities for detecting, isolating,and reporting connectivity faults for VLAN-based

service transport networks. CFM, operates at both thephysical and logical levels, monitoring and troubleshootingfaults. For instance, CFM can monitor physical linksbetween adjacent or distant devices. In addition, faultmonitoring between two end-points can be configuredbased on a logical network layer (such as per-VLAN). KeyCFM features are shown in Figure 7.

The CFM protocol, often called Ethernet OAM, sends heart-beat style Continuity Check Messages (CCMs). Failure toreceive these messages, in order, in a certain amount of timeindicates one or more possible network errors, includingpath or device failure or network configuration problems.Management stations monitor the status of the receptionof CCMs and take appropriate action.


IEEE 802.3ah EFM OAM

Features Benefits

Auto-discovery Eliminates the need for operator configurationUni-directional Fault Signaling Enables the detection of a one-way link failureRemote Loopback Provides on-demand link diagnostics, including

bit-error rate approximationLink Monitoring Offers proactive, traffic-based threshold link

monitoringCritical Events Supports communication of network element

conditions that may cause link failure, includingpower and temperature

Layer 2 Variable Retrieval Allows supplemental link statistics collection,augmenting SNMP

Organization-specificExtensions Enables standards development organizations andvendors to expand scope

Figure 5. OAM protocols matrix

Figure 6. IEEE 802.3ah EFM OAM

ServiceLevel

Agreement

Link

Fault

ServiceTransport

RepairFaultDetectionFault

NotificationFault

VerificationFault

IsolationFault

Verification

MEF Service OAM MEF

VCCV/BFD MSTP LSP Ping

IEEE 802.1ag CFM/ITU-T Y.1731 MSTP802.1ag Y.1731

IEEE 802.3ah EFM OAM MSTPSNMP 802.1ag Y.1731802.3ah

EFM OAM

IEEE 802.1ag CFMFeatures BenefitsContinuity Check Continuously verifies VLAN connectivity and

may indicate network faults or misconfigurationsLoopback Request Offers on-demand or proactive indication of (MAC ping) VLAN control-plane responsivenessLinktrace Request Provides on-demand or proactive VLAN(MAC traceroute) topology information

Figure 7. IEEE 802.1ag CFM


5/7

Troubleshooting tools are provided in the form of Media

Access Control (MAC) ping (formally known as IEEE 802.1agLoopback Request) and MAC traceroute (formally known asIEEE 802.1ag Linktrace Request). Network operators mayinitiate these features, or the features may run automaticallyas monitoring functions in background processes.

Since CFM is being developed after completion of the IEEE 802.1ad Provider Bridges protocol, a secondimportant aspect of the project allows multiple nestedMaintenance Domains (MDs) to coexist on the samephysical network, each potentially managed by adifferent administrative organization (service provideror network operator).

ITU-T Y.1731

ITU-T Study Group 13 developed Y.1731 in cooperationwith IEEE 802.1ag CFM, further defining VLAN-based servicetransport OAM functionality. Several additional features offerperformance monitoring capabilities. ITU-T Y.1731 and CFMuse an identical frame format and share the same operationcode (OpCode) space. As a result, these complementaryprotocols are simpler to deploy in a service providersnetwork. Figure 8 provides a summary of the featurescontained in Y.1731.

VLAN-based service transport networksconfigure certain network elements atMaintenance End Points (MEPs). These MEPssit at the boundaries of Ethernet domains.Figure 9 shows the span of the different OAMmechanisms offered by Y.1731

MPLS

MPLS deployed to the customer premisesfacilitates the interconnection of the accessinfrastructure with the existing MPLS corenetwork, while increasing the need for MPLS-specific OAM tools. Further description of MPLS shown in Figures 10 and 11.

LSP Ping

LSP ping is an in-band, on-demand mechanism toverify the status of an MPLS tunnel. An LSP can failbecause of misconfigurations such as disabled MPLS,mismatched labels, or routing into the wrong tunnel,or broken Label Distribution Protocol (LDP)adjacencies, corruption of Forwarding InformationBases (FIB), or other software/ hardware failures. LSPping sends an echo request to a target Label SwitchRouter (LSR) using MPLS addressing. To prevent theIP packet from being routed to its destination, thedestination IP address of the echo request packet isdefined as 127.0.0.0/8. If reached, the destination LSR

sends an echo reply back to the originator of theMPLS echo request.

ITU-T Y.1731

Features Benefits

Alarm Indication Signal Provides fault notification for devices not participating in the(ETH-AIS) VLAN-based Ethernet Continuity Check

Remote Defect Indication Offers fault indication of the other end of a VLAN-based(ETH-RDI) Ethernet serviceLocked Signal (ETH-LCK) Enables maintenance actions while differentiating and

isolating actual fault conditionsTest Signal (ETH-Test) Allows a one-way, on-demand, in-service or out-of-service

VLAN test, such as throughput or frame lossPerformance Monitoring Monitors traffic performance on a point-to-point, end-to-end,(ETH-PM) VLAN-based Ethernet serviceFrame Loss Measurement Collects end-to-end frame loss information to approximate severely(ETH-LM) errored seconds, which indicate VLAN-based service transport availabilityFrame Delay Measurement Provides an on-demand Frame Delay and Frame Delay Variation(ETH-DM) measurement between two points of the VLAN-based service

5


Figure 8. ITU-T Y.1731

Ethernet

ETH-PM

ETH-AIS, ETH-RDI, ETH-LCK ETH-Test, ETH-LM, ETH-DM

UNI

CE MEP

UNI

CEMEPMEP

Ethernet

Figure 9. ITU-T Y.1731 architecture

Ciena offers a solution allowingtransport of Ethernet services,either natively or using MPLSencapsulation.

MPLS

Features BenefitsLabel Switched Path Ping Offers on-demand connectivity information about

MPLS tunnels

LSP Traceroute Provides MPLS switching and MaximumTransmission Unit (MTU) configuration information

Virtual Circuit Connection Enables proactive connectivity monitoring of Verification MPLS pseudowires

Bi-directional Forwarding Allows scalable, proactive data-plane verificationof MPLS LSPs

Fast ReRoute Provides automated repair of MPLS failures

Figure 10. MPLS OAM


6/7

6

LSP Traceroute

LSP traceroute determines the hop-by-hop path anddestination of an LSP. Like LSP ping, traceroute is an in-band,on-demand MPLS OAM utility that uses an MPLS echorequest/reply mechanism to detect MTU misconfigurationbetween LSRs. However, with LSP traceroute, all LSRs alongthe pathup to and including the destination LSRreply tothe echo request. This technique allows the operator toidentify and distinguish LSRs along a path.

Virtual Circuit Connection Verification

Using LSP ping, a service provider can monitor the status of an MPLS tunnel. To diagnose a problem within the tunnel,the service provider needs a mechanism to verify theconnectivity of the pseudowires (VCs). VCCV allowsproactive monitoring of pseudowires within MPLS tunnels byestablishing a control channel associated with eachpseudowire.

Bi-directional Forwarding Detection

VCCV requires involvement of the MPLS control-plane; asthe number of VCs increase, so will the load on the control-plane. BFD allows systematic and more scalable detection of

MPLS LSP data plane failures, with less involvement from thecontrol plane. As a result, BFD allows faster detection ona larger number of LSPs. BFD relies on a hello packetexchanged by neighbors at negotiated, regular intervals.When a hello packet is not received as expected, theneighbor is declared down.

Fast ReRoute

Fast ReRoute allows automated repair of LSP tunnelsto reduce packet loss on LSPs. If there is a link or nodefailure, an LSP employing Fast ReRoute can redirect MPLStraffic to previously computed and established alternate

paths around the failed link or node. The alternate paths areselected during the establishment of a primary LSP underhop-by-hop control. With Fast ReRoute enabled, ResourceReSerVation Protocol-Traffic Extension (RSVP-TE) establisheslocal alternate LSPs for each potential point of failure alongthe primary path.

MEF Service OAM

The MEF is pursing a complementary set of OAM-relatedfunctions operating at the SLA layer. The Phase 1specification will contain performance monitoringcapabilities for point-to-point services reflecting the frameloss ratio, frame delay (latency), and frame delay variation(jitter) characteristics of the service, as shown in Figure 12.

In addition, per-service fault management will be supportedfor point-to-point, point-to-multipoint, and multi-pointservices. Fault detection encompasses loss of continuitybetween management end-points and detection of potentialfor loops in the service. This fault detection/ verificationcapability is supported proactively or on demand throughoperator action. MEF Service OAM, often called ServiceOAM, also provides fault isolation and fault notification.

IPEthernet services offer the benefit of low deployment costsby not requiring IP provisioning of each individual dataplane element. However, the control plane uses mostly IP-based protocols, such as Telnet, SNMP, or IGMP. In thatregard, control plane failures must be detected at the IPlevel. Two mechanisms have been in use since the adventof IP networking: IP ping, which provides on-demandconnectivity verification of the IP control-plane, and IPtraceroute, which offers routing and delay information for

an IP destination.


M P L S T u n n e l

M P L S T u n n e l

VC B

VC A

Figure 11. Basic MPLS constructs

MEF Service OAM

Features Benefits

Point-to-point Ethernet VirtualCircuit Performance Monitoring

Point-to-multipoint EVC PM Provides SLA assurance for different services

Multipoint-to-multipoint EVC PM

EVC Fault Management Enables identification and isolation of faultat the SLA layer

Figure 12. MEF Service OAM


7/7

IP Ping

IP ping is a basic mechanism that verifies IP connectivitythrough the network. It verifies that a given IP addressexists, is reachable, and can accept ping requests, andcalculates the latency between the control planes of two

IP network elements.

IP Traceroute

IP traceroute is another OAM tool that records and displaysthe IP message route between two IP elements. It alsocalculates the latency between the control-planes of eachIP element of the route.

Conclusion


1201 Winterson RoadLinthicum, MD 210901.800.207.3714 (US and Canada)

1.410.865.8671 (outside US)+44.20.7012.5555 (international)www.ciena.com

Specialising in transition to

service-driven networks to help youchange the way you compete.

Ciena may from time to time make changes to the products or specifications contained herein without notice. 2009 Ciena Corporation. All rights reserved. WP062A4 2.2009

Cienas Carrier Ethernet Service Deliverysolution, described in Figure 13, enablesservice providers to operate, administrate,and maintain any mix of Ethernet andMPLS-based L2 VPNs effectively. Byleveraging this unique OAM capability,service providers can protect currentrevenue and maximize revenue growth,while reducing operational costs.

Objectives Carrier Ethernet Service Delivery Solution

Protectsrevenue by:

Maximizesrevenuegrowth by:

Reducesoperationalcosts by:

Preventing serviceoutages:

Offering fasterservice restora tion:

Enabling richerservice offerings:

Reducing repair

costs:

Reducingoperationaloverhead:

>

Sub-50 ms automated network reconvergence> Robust Quality of Service (QoS) architecture minimizes SLA failures> Modular architecture enables planned non-invasive repairs> Redundancy for mission-critical network components

> Generates precise failure information more quickly> Service-aware OAM feature set intelligently traverses each

layer as needed> Complete OAM feature set covers each network layer (link,

service transport and SLA)

> Comprehensive Ethernet and MPLS OAM feature setsIntelligent classification

> Advanced alarm correlation simplifies fault isolation> Hot-swappable solution enables shorter and less expensive repairs

> On-demand OAM techniques eliminate unnecessary investigationof false failure indications

> Modular solution reduces cost of spares> Proactive monitoring enables cost-effective

preemptive maintenance

Figure 13. Cienas Carrier Ethernet Service Delivery solution

17830439 Ethernet and MPLS WP

Documents

Transcript of 17830439 Ethernet and MPLS WP