Scalable and FlexibleRouting Service for Tencent Cloud … · 2020-08-07 · Scalable and...

Post on 10-Aug-2020

4 views 0 download

Transcript of Scalable and FlexibleRouting Service for Tencent Cloud … · 2020-08-07 · Scalable and...

Scalable and Flexible Routing Service for Tencent Cloud Access Network

Allen Lv, TencentAug, 2020

Agenda

• Challenges

• Architecture

• Design Details

• Experience and Future Work

3

Tencent Cloud Infrastructure Overview

54+AZs

27+Regions

100T+Public Network

Bandwidth Reserved

15EB+Storage

Enterprise Branch

CVM

CVM

CDB

VPC

Private Line

| Tencent Cloud Access Network Overview

Tencent Cloud Network

CVM

CVM

CDB

VPC

AccessSite 1

Enterprise Branch

Private Line

VPN

AccessSite 2

VPN

Custom IDC

Custom IDC

ISP

Tencent Internet exchange (TIX)

ISP

Tencent Internet exchange (TIX)

5

Challenges

• Massive scale forwarding table, VRFs, Tunnels…

• Roll out network features fast

• Scale up easily for rapidly growth of traffic volume

• Low Cost

Line Card Line Card Line Card

| Traditional Commodity Router

• Hardware & Software Vendor Lock-in

• Hard to Scale

• Lack of feature velocity

• High Cost

PrimaryProcessor

SecondaryProcessor

Switching Fabric

7

| Design Philosophies

• Scalability, each component scales up independently on demand

• Flexibility, fast features delivery (~ 2 weeks)

• Reliability, NSF, NSR, fast failover

• Operationality

Software Defined Router(SDR) Overview

CS

AS AS

eBGPISP

Overlay

Data Plane

Underlay

CS

AS

CS CS

AS AS

NFV BasedRouting System

AS

Data Plane … Data Plane

Routing Plane Routing Plane … Routing Plane

Control Plane Control Plane … Control Plane

Orchestrator Orchestrator … Orchestrator

NFV BasedForwarding System

NFV BasedController

NFV BasedOrchestrator

eBGP

CustomerRouter

eBGP

CPEipsec

Tencent Cloud

Tencent FW

Tencent DDoS

Software Defined Router(SDR) Inside

Edge Access(EA)

BGP

NGW

RNSO

ExternalRouter

GNSOOSS/BSS

VPC

NGWData Plane

BGPRouting Plane

RNSOControl Plane

BGP/BFD

FIB/ARPconfig/monitor

TGRE VxLAN

GNSOOrchestrator

gRPC

config/monitor

FIB/ARP

| Customer Access (Private-Line GW & VPNGW)

VPNGW(SDR)

PLGW(SDR)

VPC 10.0.0.0/16

Interoperating with both Traditional Network and SDN-Based Network at large scale

BGP Session

EA

BGP Session

Internet

CustomerRouter

Traditional NetworkSDN-Based Network

| End-user Access (Tencent Internet Exchange)

Large scale forwarding table (10M) and flexible Traffic Engineering

EA2TIX2(SDR)

ISP Router2

BGP Session

VPC1 115.159.246.0/24

VPC2 116.150.247.0/24

EA1TIX1(SDR)

ISP Router1

BGP Session

VxLAN Fabric

| Flexibility – On-Demand Traffic Engineering

• Flexible traffic engineering based on userdemand

Site1

VPC

SDR2

Site2

VxLAN Fabric

ExternalPeer 1

ExternalPeer 2

ExternalPeer 3

ExternalPeer 4

SDR1

<SIP,DIP> ---> <SDR2, VNI>

BGP route

| Flexibility - FW Service

• Support >100k flex rules for FW purpose

Data Plane

SDR

VPC

VxLANFabric

FW Service

ExternalRouter

EA

<DIP> --> <FW, VNI><SIP> --> <FW, VNI>

| Flexibility - DDoS Service

SDR

VPC

DDoS Service

EA

180.10.1.1/32, DDoS

ExternalRouter

BGP route 180.10.1.1/32

Data Plane

| Flexibility - DDoS Service

• Redirect attack traffic to DDoS service efficientlySDR

VPC

DDoS Service

EA180.10.1.1/32, DDoS0.0.0.0/0, DP

ExternalRouter

BGP route 180.10.1.1/32

Data Plane Only processing the real traffic

| Flexibility - Interoperability

• Interoperate with existing traditional routers

• Speed up deployment of SDR

SDR

VPC

ExistingCommodityRouter

MPLSFabric

RoutingPlane

DataPlane

MPLSSwitch

ExternalRouter

eBGP

| Scalability

CS

AS

CS

AS

CS CS

AS AS

NGW FCR

AS AS

RNSO

AS AS

GNSONGWData Plane

FCRRouting Plane

RNSOControl Plane

GNSOOrchestrator

• Each component scales independently

• Each network can be operated independently

• 3.2Tbps forwarding capacity

eBGP

eBGP eBGP eBGP eBGP

| Scalability - Hardware Acceleration

VPC

EA

Data Plane Tencent SmartSwitch

• Introduce programmable switch for hardwareacceleration

• > 10Tbps forwarding capacityControl Plane

Elephant flow info

Flow offloading

Static flow info forHigh volume traffic

ExternalRouter

SDR

| Reliability – NSF & NSR

• Single node failure will not affect the system

• Data Plane supports Non-stop forwarding (NSF)

• Routing Plane supports Non-Stop Routing (NSR)

ExternalRouter1

ExternalRouter2

RoutingPlane1

RoutingPlane2

Routing System

Control System

Forwarding SystemNGWNGWData Plane

NGWNGW

Control Plane

| Reliability – NSF & NSR

• Single node failure will not affect the system

• Data Plane supports Non-stop forwarding (NSF)

• Routing Plane supports Non-Stop Routing (NSR)

ExternalRouter1

ExternalRouter2

RoutingPlane1

RoutingPlane2

Routing System

Control System

Forwarding SystemNGWNGWData Plane

NGWNGW

Control Plane

| Operationality - Monitoring

• 3 Levels Data Plane Probing

• Critical resources monitoring

• Various statistics and events

Data Plane cluster

core0

server0

core0 corex

RMOS

core0 core0 corex

server1

Cluster LevelHeath check

Server LevelHeath check

Core LevelHeath check

| Operational Experience

• Move manual configurations to centralized orchestrator as much as possible.

• Provide robust “One-Click” operation to quickly turn off the whole system.

• Keep the message queues among different components reliable and efficient.

| Future Work

• End-to-End network quality detection and analysis system for different network layers

• Automatic traffic engineering based on more network metrics like latency, link utilization…

• Simulation and verification system to detect and fix abnormal behaviors in advance

| Conclusion

• Disaggregate functionalities into individual components

• High scalability of each components at each level

• Fast features velocity via software programming

• Low Cost

switch switch …

DataPlane

DataPlane

ControlPlane

ControlPlane

Orches-trator

Orches-trator

Scalability

Flexibility

RoutingPlane

RoutingPlane

Thanks