INTELLIGENT AGGREGATION - Metaswitch · Changes in the traffic patterns in large carrier networks...

6
© 2014 METASWITCH NETWORKS. ALL RIGHTS RESERVED. WWW.METASWITCH.COM 1. EXECUTIVE SUMMARY Changes in the traffic patterns in large carrier networks have challenged design rules and accompanying economics and planning rules. By assessing these shifts in traffic patterns, and optimizing the architecture for the new reality, the operational complexity of the network(s) can be reduced, and CAPEX spend that can be reduced by up to 75% over existing approaches. This use case will look at traffic that transits through a single exchange or PoP of a carrier’s network, but does not traverse the carrier’s backbone. This traffic pattern can be referred to as intra–exchange transit traffic. 2 CURRENT BCP The current design of exchanges or Points–of–Presence (PoPs) dates back to a time when most of the carrier’s customers were lightly connected to the Internet (i.e. only a few connections, maybe only one) and most of the traffic presented by a customer (or terminating at that customer) had to travel over the carrier’s backbone (or inter–PoP) links to remote PoPs or peering points. This traffic needed to be guided to the most efficient backbone links for that traffic flow, and could be considered North–South traffic. That traffic pattern lead to the design discussed below. 2.1 Currently deployed technology As mentioned above, as a substantial portion of the traffic entering a given exchange could be assumed, at the time, to exit that exchange through a path that had to traverse a Provider Edge (PE) router 1 and a Provider (P) router 2 as discussed above. This was due to the observed behavior, at that time, that most traffic did not stay local, i.e. customers in the same exchange did not exchange much traffic with one another. Therefore, a reasonable optimization could be made (and was) to connect the customer access equipment directly to the aggregation layer routers, and the aggregation routers directly to the core routers. This avoided having one or more layer(s) of equipment that would behave purely as pass–through for the traffic (i.e. a layer of interconnect switches between the access equipment and the PE routers and/or between the PE routers and the P routers in an either customer → backbone, or backbone → customer flow. This optimization only holds true, by the way, if the links between a given set of layers have moderate to high utilization. If they are lightly utilized, then the use of an aggregation layer 3 between the two layers will almost always make economic sense, assuming the cost of ports on the grooming layer, usually an Ethernet switch, is substantially lower than the cost of the ports that are being groomed into. An example, using the access and PE layers 4 follows. Assume that we have one hundred CE 5 devices that need to be connected to one PE router. The links between the CE and PE elements are utilized at some percentage of their design capacity. Table 1 shows the economic model outputs for using aggregation switches or not in both a low utilization (30%) and high utilization (70%) case. Access equipment is not factored into the model, as the same number of access equipment is assumed to be needed (it is dependent on the CE count). The table shows that a substantial CAPEX benefit can be realized by simply grooming low utilization links. As mentioned earlier, however, more traffic is traversing East ↔ West than before. Table 1: This table identifies the differential in cost between the direct and aggregated approach to connecting the CE devices to aggregation (or PE) routers in a low utilization and high utilization environment. It only identifies the cost variables that are impacted by the change. There are other costs that remain fixed in this model (such as the cost of the CE, etc.). Furthermore, knees are not covered (when the increase in ports requires the acquisition of new cards or routers). The table does include a 20% engineering buffer for the groomed links. The biggest impact on cost savings in this model is the differential between the costs of the grooming and PE ports. Port costs are inclusive of a amortized percent of common equipment costs. 6 However, when this design was developed, the assumption of hot aggregation links was mostly correct, and, more importantly, most of the traffic traversed the core devices to exit the PoP. INTELLIGENT AGGREGATION LAYER 3 POP AGGREGATION - SCALING THE HIGHLY MESHED NETWORK Christopher David Liljenstolpe Metaswitch Networks

Transcript of INTELLIGENT AGGREGATION - Metaswitch · Changes in the traffic patterns in large carrier networks...

Page 1: INTELLIGENT AGGREGATION - Metaswitch · Changes in the traffic patterns in large carrier networks have challenged design rules and accompanying economics and planning rules. By ...

© 2014 METASWITCH NETWORKS. ALL RIGHTS RESERVED.WWW.METASWITCH.COM

1. EXECUTIVE SUMMARY

Changes in the traffic patterns in large carrier networks have challenged design rules and accompanying economics and planning rules. By assessing these shifts in traffic patterns, and optimizing the architecture for the new reality, the operational complexity of the network(s) can be reduced, and CAPEX spend that can be reduced by up to 75% over existing approaches.

This use case will look at traffic that transits through a single exchange or PoP of a carrier’s network, but does not traverse the carrier’s backbone. This traffic pattern can be referred to as intra–exchange transit traffic.

2 CURRENT BCP

The current design of exchanges or Points–of–Presence (PoPs) dates back to a time when most of the carrier’s customers were lightly connected to the Internet (i.e. only a few connections, maybe only one) and most of the traffic presented by a customer (or terminating at that customer) had to travel over the carrier’s backbone (or inter–PoP) links to remote PoPs or peering points. This traffic needed to be guided to the most efficient backbone links for that traffic flow, and could be considered North–South traffic.

That traffic pattern lead to the design discussed below.

2.1 Currently deployed technology

As mentioned above, as a substantial portion of the traffic entering a given exchange could be assumed, at the time, to exit that exchange through a path that had to traverse a Provider Edge (PE) router1

and a Provider (P) router2 as discussed above. This was due to the observed behavior, at that time, that most traffic did not stay local, i.e. customers in the same exchange did not exchange much traffic with one another. Therefore, a reasonable optimization could be made (and was) to connect the customer access equipment directly to the aggregation layer routers, and the aggregation routers directly to the core routers. This avoided having one or more layer(s) of equipment that would behave purely as pass–through for the traffic (i.e. a layer of interconnect switches between the access equipment and the PE routers and/or between the PE routers and the P routers in an either customer → backbone, or backbone → customer flow.

This optimization only holds true, by the way, if the links between a given set of layers have moderate to high utilization. If they are lightly

utilized, then the use of an aggregation layer3 between the two layers will almost always make economic sense, assuming the cost of ports on the grooming layer, usually an Ethernet switch, is substantially lower than the cost of the ports that are being groomed into.

An example, using the access and PE layers4 follows. Assume that we have one hundred CE5 devices that need to be connected to one PE router. The links between the CE and PE elements are utilized at some percentage of their design capacity.

Table 1 shows the economic model outputs for using aggregation switches or not in both a low utilization (30%) and high utilization (70%) case. Access equipment is not factored into the model, as the same number of access equipment is assumed to be needed (it is dependent on the CE count). The table shows that a substantial CAPEX benefit can be realized by simply grooming low utilization links.

As mentioned earlier, however, more traffic is traversing East ↔ West than before.

Table 1: This table identifies the differential in cost between the direct and aggregated approach to connecting the CE devices to aggregation (or PE) routers in a low utilization and high utilization environment. It only identifies the cost variables that are impacted by the change. There are other costs that remain fixed in this model (such as the cost of the CE, etc.). Furthermore, knees are not covered (when the increase in ports requires the acquisition of new cards or routers). The table does include a 20% engineering buffer for the groomed links. The biggest impact on cost savings in this model is the differential between the costs of the grooming and PE ports. Port costs are inclusive of a amortized percent of common equipment costs.6

However, when this design was developed, the assumption of hot aggregation links was mostly correct, and, more importantly, most of the traffic traversed the core devices to exit the PoP.

INTELLIGENT AGGREGATIONLAYER 3 POP AGGREGATION - SCALING THE HIGHLY MESHED NETWORK

Christopher David Liljenstolpe

Metaswitch Networks

Page 2: INTELLIGENT AGGREGATION - Metaswitch · Changes in the traffic patterns in large carrier networks have challenged design rules and accompanying economics and planning rules. By ...

© 2014 METASWITCH NETWORKS. ALL RIGHTS RESERVED.WWW.METASWITCH.COM

2.2 What’s changed?

Since this design pattern was developed, the pattern of interconnection has changed, remarkably. The number of substantial networks in the Internet has shrunk, with a few major content and eyeball networks presenting most of the traffic, and all of them having a wide geographic footprint. Most of the eyeball and content networks interconnect in most geographies, to the extent that a substantial fraction of the traffic, due to hot–potato7 routing, never leaves the exchange where it ingresses the IXPs network. An example would be that a Comcast customer in San Francisco’s YouTube video stream will enter the IXP’s San Francisco PoP via Google’s connection to that PoP, and immediately be routed out of that PoP via Comcast’s connection to that same PoP.

This means that more and more of the traffic in the IXP network will not traverse the IXP’s backbone (as a fraction of total traffic) but instead ingress and egress the IXP’s network from the same PoP.

This is referred to as East–West traffic as compared to North–South traffic that represents traffic that travels over the IXP’s network between two PoPs. Since this East-West traffic does not need to traverse the core network, and therefore, if a more cost optimized solution were available to interconnect the IXP’s customers within a given PoP, substantial savings could be made.

Table 2 shows the CAPEX improvement if that East ↔ West traffic could be offloaded from the PE routers. The technical solution to accomplish this is outlined later in this paper. As the inputs are the same as in Table 1, a direct comparison between the results of 2 and Table 1 can be made. The additional column in Table 2 is the percent of East ↔ West traffic that can be offloaded from the PE routers, thereby further reducing the PE port count required. The direct results are not shown in Table 2. Please refer to Table 1 for those baseline numbers.

Table 2: This table identifies the differential in cost between a simple grooming architecture, and two degrees of East↔West offload in both a low and high utilization environment. It only identifies the cost variables that are impacted by the change. There are other costs that remain fixed in this model (such as the cost of the CE, etc.). Furthermore, knees are not covered (when the increase in ports requires the acquisition of new cards or routers). Port costs are inclusive of an amortized percent of common equipment costs.

3 WHAT’S THE CHALLENGE?

What was originally a mechanism to reduce the amount of equipment in the PoP, and therefore reduce CAPEX and OPEX has created unnecessary CAPEX costs now that traffic patterns have mutated to a more East–West oriented set of flows. It would be beneficial to the operator if commodity Ethernet switches could be used to directly

interconnect the CEs connected to a PoP, and lift the burden of inter–CE flows from the more complex, and expensive, PE routers.

3.1 Technical issues

In the current architecture, the PE router provides two functions for the CE – CE flows.

1. Traffic aggregation — The aggregation of multiple flows heading for the same destination (in this case a CE or a core router) into a set of links that connect to that destination.

2. Route aggregation or default routing — The PE either aggregates the routes that are fed to the aggregation router, or simply advertises default to the CE and handles the optimal route selection once the traffic has ingressed into the PE router.

The solution discussed in section 4.1 will provide a mechanism to handle CE – CE flow aggregation and path selection. It will do so in an L3 aware model, where the smart aggregation switch will make forwarding decisions as an L3 device would, but without all of the overhead of a fully–functional PE router. It will handle the vast majority of CE – CE flow cases, and allow more complex forwarding decisions to be made by the existing PE routers (such as inter–PoP or Internet forwarding, policy based forwarding, etc. This will optimize the network by only requiring PE router capacity for non CE – CE flows.

3.2 Business impacts

As can be seen in the tables presented earlier, if there is substantial East–West traffic, and/or lower density of link utilization between the edge and core routers, then there is an economic benefit of providing a switching layer between the aggregation and core router layers. Savings that can be realized can exceed 75% of CAPEX that is programmed for aggregation back haul.

4 PROPOSED SOLUTION

The basic solution being proposed is to place one or more commodity Ethernet switches into the PoP as smart aggregation switches. Those switches would connect the CE devices and/or access devices to the PE routers as well as to each–other. Those switches would be managed by an SDN controller that would allow the most efficient paths to be selected through the switch(es), while maintaining a loop–free behavior. The Ethernet switches would not be running an Ethernet control plane, and would therefore avoid the instability that has plagued Ethernet networks at carrier scale. That controller would perform the same functions that the legacy Ethernet control plane protocols performed, but with a global (rather than switch–centric view). Furthermore, the controller will address the route aggregation functionality previously mentioned.

Furthermore, all of the smart aggregation (commodity Ethernet) switches would have the same hard state (configuration et.al)8 The controller computes each switches running behavior from the network–wide configuration and communicates that running, dynamic configuration using industry standard SDN protocols, such as Open Flow, OF Config, SNMP, NetConf, etc. Therefore, even though the controller and switches introduce more equipment, the configuration load of the network is actually reduced and centralized.

Page 3: INTELLIGENT AGGREGATION - Metaswitch · Changes in the traffic patterns in large carrier networks have challenged design rules and accompanying economics and planning rules. By ...

© 2014 METASWITCH NETWORKS. ALL RIGHTS RESERVED.WWW.METASWITCH.COM

The controller will also learn the routing topology from the PE routers as well as listening into on–net control plane traffic, allowing it to build an L3 forwarding topology. It will distribute that L3 forwarding topology to the smart aggregation switches, to allow them to forward the CE – CE traffic without the need to route it through the PE router first.

4.1 Technical architecture

As was just mentioned, the physical components of this solution is a collection of Ethernet switches that interconnect the CE that is served from a given PoP or set of PoPs with each other as well as with the PE routers. A controller would be deployed with those switches to provide a loop–free topology for the CE–CE traffic.

The benefits outlined in the business impacts and earlier tables realized by forwarding traffic that would pass between two CE elements in a given PoP without using the PE routers. This would remove that traffic load from the PE routers, reducing the number of ports or rate of augmentation that would have to happen on the core routers, in return for purchasing more capacity in the switches. Given that commodity Ethernet switches are substantially cheaper than the ports of the same bandwidth on most/all of core routers, the benefits of this substitution are obvious.

To address the L3 forwarding and topology requirements, there are three approaches:

1. The controller establishes a BGP peering with the PE routers, in turn learning the best path for each destination in the network. CE routers are peered with the PE routers, and therefore CE advertised routes are learned by the controller via the PE. In the case where the CE is statically routed from the PE, then the SDN controller would learn the CE routes via it’s BGP peering with the PE routers. The controller learns the path to the next–hop address via the topology information it has learned from it’s directly connected networks. The controller does not, in this model, announce routes, nor modify NHS. The benefits of this approach are that the existing routing configuration between the IXC and it’s customers does not need to be interrupted by the insertion of the new, SDN–controlled aggregation layer.

2. The controller itself peers with BGP–speaking CE elements, as well as the PE routers. It sets NHS on external BGP learned or announced routes, and uses SDN manipulations to insure that traffic stays on the switch paths, rather than actually being directed to the controller for forwarding. In this model, the routing configuration between the IXC and it’s BGP speaking customers will probably need to be modified. Also, in this model, the classical PE routers are only necessary for high–touch packet treatments that are un–available in the aggregation switch platforms, whereas in the first model, the PE’s are also necessary as BGP routing endpoints for CE BGP sessions.

3. If the aggregation switching layer is only connected to residential and SOHO customers that only ever use addresses provided by the ISP who owns the infrastructure and the addressing plan in use aggregates routes downstream of each CE in to a small number of prefixes (no more than in the 10’s), then the inter–CE routes could be statically configured into the SDN controller at CE provisioning time. The problem with this

option is that it is fragile (static routes) and will only work in well aggregated address block environments. The advantage is that the only need for routing protocols is to enable the SDN controller to discover the best PE (or core router) for inter–exchange or public Internet traffic.

When considering the impacts (and requirements) in these three options, we advise that the first approach enumerated above (BGP peering with the PE routers) be the approach that is taken as a first step in deploying this model.

4.1.1 Solution diagrams

We demonstrate two potential uses of this approach in the diagrams that follow. Figures 1, 2, and 3 show an IXC use case where the CE’s are connected directly to the aggregation switches, whereas figures 4, 5, and 6 show a consumer–focused service provider that has content delivery (CDN) capabilities distributed to the PoP level.

Figure 1: An example topology for the solution being discussed. Solid blue lines are physical connectivity, dashed orange lines are SDN control connections, and solid orange lines are routing links between the SDN controller and the existing router(s).

Page 4: INTELLIGENT AGGREGATION - Metaswitch · Changes in the traffic patterns in large carrier networks have challenged design rules and accompanying economics and planning rules. By ...

© 2014 METASWITCH NETWORKS. ALL RIGHTS RESERVED.WWW.METASWITCH.COM

Figure 2: This diagram shows the data flow between two CPEs on the network, both downstream of different SDN–enabled aggregation switches. The solid blue lines show the data path, and the text notes show what look–up is being done at each switch. The degenerate case (both CE devices connected to the same SDN–enabled aggregation switch) would keep the traffic isolated to the one SDN–enabled aggregation switch.

Figure 3: This diagram shows the same basic case as in figure 2, but in this case one of the CE devices is connected to a legacy aggregation switch.

Figure 4: The consumer–focused example architecture. Color codes are the same as in figure 1.

Figure 5: The example here has two consumers, each requesting content from the SP’s CDN infrastructure, and the smart aggregation switches directing each request (and the associated return traffic) to the correct CDN server.

Page 5: INTELLIGENT AGGREGATION - Metaswitch · Changes in the traffic patterns in large carrier networks have challenged design rules and accompanying economics and planning rules. By ...

© 2014 METASWITCH NETWORKS. ALL RIGHTS RESERVED.WWW.METASWITCH.COM

Figure 6: This example shows how the smart aggregation switches send traffic to the PE router for flows that are not within the PoP (in this case the flow is Internet–bound), or otherwise requires PE handling.

4.1.2 What portion can Metaswitch provide?

Metaswitch can provide the SDN controller, including computing and maintaining the Ethernet topology and loop avoidance capabilities. The route–reflector requirements are another major component of Metaswitch’s offering in this space, and that function is based on Metawitch’s mature BGP stack.

4.1.3 What else is required?

The IXC/ISP would need to select an aggregation switch from a list of switches that can be controlled by the Metaswitch SDN controller.

4.1.4 Challenges

The challenges represented by this approach are as follows:

1. This is a topology and operational change for most carriers. Two new network elements are being introduced (the aggregation switches and the controller), which will require design engineering and lab testing work, as well as training for the operations and engineering staff. The mitigation is that these switches will look and behave just as another Ethernet switch, at least from the view of the routers connected through the switches in question. As such, this deployment can be a staged approach, no red flag day is required.

2. Occasional re–grooming of links between aggregation switches may be necessary as traffic flows change within the PoP. Those would be done during scheduled maintenance windows.

3. The PE router vendors will see this as impacting their revenue stream at an IXC/ISP that is considering this move.

5 SUMMATION

By utilizing Metaswitch’s unique synthesis of SDN controller technology and a mature IP routing stack, Metaswitch can provide the service provider most of the building blocks necessary to deploy commodity Ethernet switches in the PoP to reduce the CAPEX (and support contract driven OPEX) of CE to PE router links, realistically by half or more in an approach that does not require changes to the routers, and utilizes standard infrastructure components such as BGP and Ethernet, thereby easing adoption.

ABOUT THE AUTHOR

Christopher Liljenstolpe is Director of Solutions Architecture for the Metaswitch Networking Business Unit. Christopher previously served as solutions architect for Big Switch Networks where he played a key role realizing some of the industry’s earliest SDN deployments. Prior to Big Switch, he was the director of architecture, networks and services at Telstra. His past roles have also included Chief Architect, Cable & Wireless, CTO for the IP Division, APAC for Alcatel-Lucent, and L3 architect for Woven Systems. Christopher has acted as co-chair of the Operations Area Working Group in the IETF and has spoken widely on IP, MPLS, SDN, and network operational issues in various standards bodies and conferences.

Page 6: INTELLIGENT AGGREGATION - Metaswitch · Changes in the traffic patterns in large carrier networks have challenged design rules and accompanying economics and planning rules. By ...

© 2014 METASWITCH NETWORKS. ALL RIGHTS RESERVED.

Subject to change without notice. Contact your local sales representative or go to www.metaswitch.com/specs for most current information.

WWW.METASWITCH.COM

ENDNOTES

1 Sometimes a PE router is referred to as an aggregation router or Label Edge Router

(LER). This paper will use the term PE router, or just PE.

2 P routers are also referred to as core routers, MPLS core switches, or Label Switch

Routers (LSR). This paper will use the P router terminology.

3 The aggregation layer is a layer that is introduced between the access layer and the

PE layer. It directs and grooms traffic to/from the correct PE or CE devices, however it

does not act as a service definition or management layer. In the case of this paper, the

aggregation layer is provided by aggregation switches. The aggregation switch should

not be confused with an aggregation router which is another name for a PE.

4 The access layer generally refers to the infrastructure that connects customer

premise equipment (or CPE) with the carrier’s infrastructure, and the provider edge

(PE) layer generally refers to the equipment that is the MPLS and/or IP edge of the

service provider’s network. For high–bandwidth or other direct connect customers,

the CPE is directly connected to the PE router. Lower–bandwidth customers, such as

SME or residential customers, will usually have access equipment that connects their

CPE to the PE infrastructure. That access equipment may be a DSL DSLAM, a cable

CMTS, or Metro Ethernet switch. That access equipment is then connected directly

to the PE.

5 The Customer Edge device is the demarc of the customer’s network and connects

the customer’s network to the service providers infrastructure. It can be owned and/or

controlled by either the customer or the service provider. Other names for the CE are

Customer Premise Equipment (CPE) or Customer Router (CR).

6 In a PE router that can support 10 cards, and each card can support 20 ports, then

the port cost is the amortized cost of 5 % of the card, and 0.5 % of the base platform

cost.

7 Hot potato routing is the standard routing–policy of most IXPs (Inter-Exchange

Carrier — in Internet parlance, the backbone, or tier-1 carriers). Hot potato routing

says that if an IXP is presented with traffic destined for another network, it will hand

that traffic off at the closest point where that network interconnects with the IXP, rather

than carrying the traffic through the IXPs internal backbone to a remote handoff which

might be better for the remote network. The obverse is cold potato routing.

8 Hard state is state that resides in the device across reboots. Generally this is

configuration state. In the case of the smart aggregation switch, all of the switches

in the PoP would have the same configuration, except it’s own local IP address. This

greatly simplifies network configuration management.