Design of the MultiService Router (MSR): A Platform for Networking Research
description
Transcript of Design of the MultiService Router (MSR): A Platform for Networking Research
WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Design of the MultiService Router (MSR): A Platform for Networking
Research
Fred Kuhns
Washington University
Applied Research Laboratory
http://www.arl.wustl.edu/arl/project/msr
2WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Presentation Overview
• Overview: – Motivation, Context and Requirements
• System Architecture
• IP Forwarding in the MSR
• Introduction to the Control Protocol
3WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Motivating Example
open_session (type,args)
codeservers
Session Establishment
get app. spec.
& plugin code
determine best config.
add destinatio
n host
• Gigabit links• Traffic isolation• Security and DOS• Rapid prototyping• Experimental protocols• Resource reservations and guarantees• Embedded applications and active processing
4WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Identifying Design Requirements
• MSR must support Ethernet or ATM using PVCs• Multiple hosts/routers per physical interface• One or more routers a network/LAN
Hosts LAN A
CR
CR
ARPort 1
Net A
Port 2AR
Hosts LAN B
Port 1 Port 2
Net B
Net C Net D
Net E
Net F
Net G
5WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
MSR Project Goals• Develop an open, high performance and
dynamically configurable Multiservice routing platform for use in Networking Research:– Support gigabit link speeds
• independent of specific link technologies
– Port Processor independence: SPC, FPX or Both• Configuration discovery at initialization time
– Optimized packet processing in hardware (FPX)• IP forwarding and advanced packet scheduling• Active processing in hardware (FPX)• Support prototyping of new functions in software (SPC)
before migrating to the FPX
6WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
– Create framework for experimenting with new protocols for traditional routing or resource management (QoS).
• Extensible and robust software frameworks– Router control and resource management
– Support conventional routing protocols such as OSPF
• Avoid needless abstractions
– Leverage existing and legacy efforts• Leverage existing code or libraries
• Gigabit kits program, Extreme Networking project and Programmable Networks vision
MSR Project Goals
7WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Three Core Areas• Hardware architecture and performance
– High-performance forwarding path and core interconnect
– Hardware Components:• WUGS, APIC, SPC and FPX provide core components
• Top-Level Functional Requirements (Framework)– Captures the management and control operations. In
the MSR, most top-level functions are implemented on a centralized control processor.
• Port-level Functional Requirements (Framework)– The IP forwarding path, active processing and port
level control functions. Also statistics and monitoring.
8WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Top-Level Framework• System Initialization: default routes and resources• Routing and Signaling
– Extensible framework for routing protocol support (zebra).
• Active routing: extended version of OSPF (Ralph Keller)
– Associating received LSA with correct MSR interface
– Forwarding table management/distribution (SPC and FPX)
• Local Management and Control– Configuration discovery and initialization
– Monitor and control local resources
– Resource reservations: admission control, allocation and policing
– API for plugin loading/unloading, instance creation/instance deletion, binding instances to filters, filter creation/deletion
9WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Port-Level Functional Requirements • Control and management interface• Per flow or flow aggregate Resource allocation
– Packet classification and general filters– Distributed Queuing (DQ)– Fair queuing at output ports (DRR)– Flow based resource allocations
• IP Forwarding– Standard IP best-effort forwarding– Reserved flow forwarding - exact match
• Active Processing Support– Plugin environment– Monitor and control resource usage– Dynamic plugin download and execution
10WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Status: Complete• Initial architectural design and implementation
complete:– software modules and hardware components
– Core software module design and implementation complete: general filter, exact match classifier, fipl, distributed queuing, active processing, virtual interfaces, port command protocol.
• Testing of IP forwarding function in FPX complete.
• Initial DQ testing complete
• Queue State DRR simulation and initial integration complete.
11WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Status: Current Effort• Analysis and Enhancement of the
– Distributed Queuing algorithm and its implementation.
– exact match classifier and route pinning with reservations
• Validate DRR• Test and verify Wave Video plugin and demo• Simple flow entry and route cache timeouts• Validate plugin bindings to exact match
classifier• Routing protocol support (OSPF and Zebra)
12WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Services Planned for Development• Extreme Networking - http://www.arl.wustl.edu/arl/projects/extreme:
– Lightweight Flow Setup Service• one-way unicast flow with reserved bandwidth, soft-state• stable rate and transient rate reservations
– Network Access Service (NAS)• provides controlled access to LFS• registration/authentication of hosts, users• resource usage data collection for monitoring, accounting
– Reserved Tree Service (RTS)• configured, semi-private network infrastructure• reserved bandwidth, separate queues • paced upstream forwarding with source-based queues
13WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Future Extensions to Basic MSR• Per source aggregate queues based on source prefix
– DRR service; Discard policy is longest queue with hysteresis, discard front
• Super scalable packet scheduling– approximate radix sort w/compensation (timing wheels)
• Lightweight flow setup protocol implementation– flow identification in SPC, returns virtual output queue– framework for managing BW allocations and unfulfilled requests– interface to NAS
• Reserved Tree Service: Hardware only.• Enhanced (per flow?) Distributed Queuing• NAS implementations:
– User authentication, policy enforcement, monitoring & feedback
14WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Presentation Overview
• Overview of Project and status
• System Architecture
• IP Forwarding in the MSR
• Control Path
• SPC kernel processing
• Control Protocol
15WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
MSR Hardware ComponentsControl
Processor
Switch Fabric
AT
M S
wit
ch C
ore
Por
t P
roce
ssor
s
PP
LC
PP
LC
PP
LC
PP
LC
PP
LC
PP
LC
Lin
eC
ard
s
16WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Port Processors: SPC and/or FPXControl
Processor
Switch Fabric
AT
M S
wit
ch C
ore
Por
t P
roce
ssor
s
FPX
SPC
LC
IPP
OP
P
FPX
SPC
LC
IPP
OP
P
FPX
SPC
LC
IPP
OP
PFPX
SPC
LC
IPP
OP
P
FPX
SPC
LC
IPP
OP
P
FPX
SPC
LC
IPP
OP
P
Lin
eC
ard
s
17WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Example SPC and FPX Design
APIC
IPClassifier
DQModule
NID
X.1
Z.2
shim
Act
ive
Pro
cess
ing
SPC FPX
Flow Control
Shim contains results of classification step
18WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
MultiService Router - Overview
PP
PP
PP PP
PP
DQ
plugin
plugin
plugin plugin
PPPE
FP FP
PP
PP
Configure
GBNSC(switch & ports)
MMCP
flexroutdRouting
RA
OSPF
Local Interface
framework
Signaling Agents
OSPF
flexsig
NCMO/Jammer
RSVPNOCNet ManagerApp and GUI
classify/lookupDRR
classify/lookup
DRR
Signaling
Resource
WUGS
MSR
MSR control
PE
classifyDQ
classify
CP - Control ProcessorRA - Route AgentsMM – MSR ManagerPP - Port Processor (SPC/FPX)PE – Processing EnvironmentDQ – Distributed QueuingDRR – Deficit Round RobinFP – Forwarding Path
19WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Top-Level Components• MultiService Router (MSR)
– Control Processor (CP): System monitoring and control:• MSR Manager (MM): router configuration; signaling protocol;
Forwarding db and Classifier rule set management; system monitoring; port level resource management and control; local admission control; discovers hardware configuration at startup .
• Routing Agent (RA): local instance of routing protocols, communicates with remote entities. Sends route updates to RM. Currently Zebra based.
• WUGS switch controller (GBNSC), used for monitoring functions: sends control cells to WUGS to read statistics and configuration information.
– Port Processor (PP): Port level resource management.• Forwarding Path (FP): modules/components performing the core IP
forwarding and port level control operations. Route lookup, flow classification, filtering, distributed queuing and fair output queuing.
• Processing Environment (PE): infrastructure supporting active processing of IP datagrams. Application flows are bound to a software (or hardware module in the FPX) processing module.
20WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Top-Level Components• Network Operations Center (NOC) - GUI interface
– Network Management Application• Active metric collection
• Passive monitoring of DQ. Display formats include format, temporal resolution, processing overhead.
• Metric and display evaluation
• Active management not implemented.
– Supports MSR Testing• test/demo configuration and setup
• identify meaningful metrics and display architecture
– Display and "manage" MSR configuration• interface to init MSR, change per port attributes
• reset MSR
• set runtime parameters
21WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
The Control Processor
Native ATM Library
SPC LibsCmd/Msg
FPX ControlLogic/Cell Libs
Switch & APICControl/Cell Libs
GBNSCConfig
INET API
MSRWrappers
PolicyManager
ResourceManager
IP RoutingManager
MSR Abstraction LayerConfiguration
MSR Manager Operational Control
flexsig
TCP UDP
IP
Device (APIC) Driver (Common Code)
Native ATM"raw"
22WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Control Processor Tasks• The Top-Level framework, Currently supports:
• MSR Initialization
• Data collection for the Management tool
• Test and Demo Environment Support
• Routing support - in development.– OSPF - Standard Routing Protocols
• Local resource management and global signaling– Monitor resource usage – Support Active Processing - plugins
23WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Requirements - RM• Resource identification and initialization: • Create and distribute routing tables to port processors
– constructed using OSPF and a QoS routing protocol• Distributed queuing (DQ) management
– reserves output link BW and sets policies.• Allocation of resources within individual ports
– Static allocation - configuration script or admin commands
– Dynamic allocation or re-allocation• out-of-band allocation by manager• out-of-band allocation by signaling agents• in-band as needed by particular services
24WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Distributed Routing Table Admin
OSPFro
ute
tab
lesRSVP OSPF#
flow
tab
le
EN
VC space and BW - Admission Cntrl
Merge Tables
Port 1 Port 2 Port 3
...
Port NIP R
esou
rce
Man
agem
ent
...
25WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Routing Support• Context: Programmable Networks• Focus: Software Component Architecture• Issues:
– Building, maintaining and distributing route tables
– Delivery of updates (LSAs from neighbors) to CP
– Format of route table entries
– Support for logical interfaces (sub-interfaces)
– CP component interactions (APIs):• Routing - Zebra, OSPFd and msr_route_manager
• signaling and resource manager
• Assumptions: – All routing neighbors are directly attached
26WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Presentation Overview
• Overview of Project and status
• System Architecture
• IP Forwarding in the MSR
• Control Path
• SPC kernel processing
• Control Protocol
27WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
MSR 0.5 – Basic IP Forwarding• Core functions implemented in basic MSR
system (aka Version 0.5)– Control Processor
• System monitoring
• System configuration
– Port level Software (SPC):• Fast IP Lookup (FIPL) and Table management
• APIC Driver (the engine)
• MSR Memory management (buffers)
• High priority periodic callback mechanism
• Distributed Queuing
• General Classifier and an active processing environment
28WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
SPC/FPX
SPC/FPXSPC/FPX
Phase 0.5 - Basic IP Forwarding
IP
IP
IP
IP
WUGS
SPC/FPX
CP
Does not show distributed queuingDoes not show distributed queuing
rout
erro
uter
routerrouter
One connectedIP entity per port
Control Traffic
loopback
29WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Distributed Queuing• Goals
– Maintain High Output Link Utilization– Avoid Switch Congestion– Avoid Output Queue Underflow
• Mechanisms– Virtual Output Queuing (VOQ) at the inputs– Broadcast VOQ and output backlogs every D sec
– Each PPi recalculates ratei,j every D sec
30WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
DQ - Cell Format
Broadcast DQ summary cells every D sec:
•Src port - sending port number (0-7)•Overall Rate - total aggregate rate (BW) allocated to this port for the port-to-switch connection –currently not used
•Output queue length - bytes queued in output port’s output queue.
•VOQ X Queue Length - number of bytes queued in src port’s VOQ for output port X.
Cell Header Src port
32 0
VOQ 0 Queue LengthVOQ 1 Queue LengthVOQ 2 Queue LengthVOQ 3 Queue LengthVOQ 4 Queue LengthVOQ 5 Queue LengthVOQ 6 Queue LengthVOQ 7 Queue Length
PaddingOutput Queue Length
Overall Rate
VCI = DQVC
31WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
P4
P5
P6
P7
p0 p8
DQdata
cell hdr
DQdata
cell hdr...
queu
e
p0 p8
DQdata
cell hdr
DQdata
cell hdr...
queu
e
p0 p8
DQdata
cell hdr
DQdata
cell hdr...
queu
e
p0 p8
DQdata
cell hdr
DQdata
cell hdr...
queu
e
p8 p0
DQdata
cell hdr
DQdata
cell hdr...
queue
p8 p0
DQdata
cell hdr
DQdata
cell hdr...
queue
p8 p0
DQdata
cell hdr
DQdata
cell hdr...
queue
p8 p0
DQdata
cell hdr
DQdata
cell hdr...queue
wugsP0
P1
P2
P3SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
DQ
DQ
DQ
DQ DQ
DQ
DQ
DQDQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdrDQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
out
Qs
out
Qs
out
Qs
out
Qs ou
t Qs
out Q
sou
t Qs
out Q
s
Determine per output port queue depth
CP
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
192.168.200.X
192.168.201.X
192.168.202.X
192.168.203.X
192.168.204.X
192.168.205.X
192.168.206.X
192.168.207.X
MSR Router: Distributed Queuing
192.168.203.2
192.168.202.2
At each port, DQ runs every D sec
out
Qs
out
Qs
out
Qs
out
Qs ou
t Qs
out Q
sou
t Qs
out Q
s
DQ updates packet scheduler to pace each VOQ according to backlog share
DQ summary cells wait in queue for start of next cycle
p0 p8
DQdata
cell hdr
DQdata
cell hdr...
queu
e
p0 p8
DQdata
cell hdr
DQdata
cell hdr...
queu
e
p0 p8
DQdata
cell hdr
DQdata
cell hdr...
queu
e
p0 p8
DQdata
cell hdr
DQdata
cell hdr...
queu
e
p8 p0
DQdata
cell hdr
DQdata
cell hdr...
queue
p8 p0
DQdata
cell hdr
DQdata
cell hdr...
queue
p8 p0
DQdata
cell hdr
DQdata
cell hdr...
queuep8 p0
DQdata
cell hdr
DQdata
cell hdr...queue
Read all summary cells (including own) andcalculate output rate for each VOQ.
p0 p8
DQdata
cell hdr
DQdata
cell hdr...
queu
e
p0 p8
DQdata
cell hdr
DQdata
cell hdr...
queu
e
p0 p8
DQdata
cell hdr
DQdata
cell hdr...
queu
e
p0 p8
DQdata
cell hdr
DQdata
cell hdr...
queu
e
p8 p0
DQdata
cell hdr
DQdata
cell hdr...
queue
p8 p0
DQdata
cell hdr
DQdata
cell hdr...
queue
p8 p0
DQdata
cell hdr
DQdata
cell hdr...
queue
p8 p0
DQdata
cell hdr
DQdata
cell hdr...queue
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
DQdata
cell hdr
Create DQ summary cell for this port and Broadcast cell to all input ports (including self)
32WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Distributed Queuing Algorithm• Goal: avoid switch congestion and output queue underflow.
• Let hi(i,j) be input i’s share of input-side backlog to output j.– can avoid switch congestion by sending from input i to output j at rate
LShi(i,j)
– where L is external link rate and S is switch speedup
• Let lo(i,j) be input i’s share of total backlog for output j.
– can avoid underflow of queue at output j by sending from input i to output j at rate Llo(i,j)
– this works if L(lo(i,1)+···+lo(i,n)) LS for all i
• Let wt(i,j) be the ratio of lo(i,j) to lo(i,1)+···+lo(i,n).
• Let rate(i,j)=LSlo(wt(i,j),hi(i,j)).
• Note: algorithm avoids congestion and avoids underflow for large enough S.– what is the smallest value of S for which underflow cannot occur?
33WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
CP
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
WUGSP0
P1
P2
P3
P4
P5
P6
P7
192.168.200.X
192.168.201.X
192.168.202.X
192.168.203.X
192.168.204.X
192.168.205.X
192.168.206.X
192.168.207.XSPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
DQ
DQ
DQ
DQ DQ
DQ
DQ
DQ
192.168.203.2
192.168.202.2
IP fwd
IP fwd
MSR IP Data Path - An Example
192.168.200.2 192.168.204.2
34WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
MSR Version 1.0 - Enhanced• Control Processor
– Configuration discovery
– enhanced download with broadcast
– Command protocol implementation
• Port level Software (SPC):– Virtual Interface support and Shim processing
– Dynamic update of FIPL table
– MSR Memory Management enhancements
– Distributed Queuing
– Exact Match Classifier
– Command protocol implementation
– Embedded Debug facility
35WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Virtual Interfaces on ATM
SPC/FPX 2xxPort 1
look
up
outout
40434244
43444240
50
51
50
51
Port 2 Port 4
Port 3
CP
ATMSwitch Port 3
Port 0
Port 1
Port 2
R R
Host
VC=50
VC=51
X
No PVC, No Trafficto/from MSR
36WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
SPC
SPCSPC
WUGS
SPC 63
63Port 1
Port 2
look
up
look
up
look
up
out
out
out
out
out
out
40434244
Port 3
4143444040414344
43444240
63
63
40414244
4341424040434142
41424440
5051
5352
5051
5352
look
up
outout
5051
5352
5051
5352
5051
5352
5051
5352
5051
5352
5051
5352
Port 4
control
control
control
control
Port 0
CP
SPC
lookup/out processing
50
VP0
51
VP1
52
VP2
53
VP3
IP (udp/tcp)
ospfRA
configRM
rawatm
socket atm...
• Port loopback not shown• IP Address bound to
virtual interfaces• only ip fwd path shown
Virtual Interfaces,or Virtual Ports (VP)
IP layer: routes pktsto/from sockets
Sockets: Communication endpoints
Driver: routes pkts between interface and net layer
Internal MSR Connections (SPC only)
37WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
SPC
shimupdate
shimdemux
Packet Routing, SPC and FPX
WUGS
...64 ... 127
(out port + 64)
63
...64 ... 127
(in port + 64)
Ingress Egress
VCs tonext hoprouters
(p2p conn)
From previous
hop router
add
shim
rem sh
imFIPLshimproc.
FPX FPX
SPC
shimdemux
shimupdate
63
Outbound VC = SPI + 1280 <= SPI<= 15
Lin
k In
terface Lin
k I
nte
rfac
e
IP eval: IP processing for FPX. 1. Broadcast and Multicast
destination address2. IP options3. Packet not recognized
Inbound VC = SPI + 1280 <= SPI <= 15
Currently Support at most 4 Inbound VCs: One for Ethernet or
Four for ATM
Current VCI Support:1) 64 Ports (PN)2) 16 sub-ports (SP)
FIPL
IPproc
plugins
FIPL
IPproc
plugins
Ether onlyVC to
endstations
Ether onlyVC from
endstations
38WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
CP
SPC/FPX
SPC/FPXSPC/FPX
WUGS
SPC/FPX
50
Using Virtual Interfaces
Port 1
Port 2 Port 4
Port 0
look
up
look
up
look
up
look
up
outshim
40434244
Port 3
4143444040414344
43444240
40414244
4341424040434142
41424440
50
51
50
51
505152
505152
50
51
50
51
50
50
VP0
SPC/FPX
shim processing
shim
shim
shim
VIN IP/Mask VC PortSub
Port0 192.168.200.1/24 50 0 0
1 192.168.201.1/24 50 1 0
2 192.168.202.1/24 51 1 1
3 192.168.203.1/24 50 2 0
4 192.168.204.1/24 51 2 1
5 192.168.205.1/24 52 2 2
6 192.168.206.1/24 50 3 0
7 192.168.207.1/24 51 3 1
8 192.168.208.1/24 50 4 0
Input port• lookup IP route • Insert shim•send to output
Output port• Reassemble frame• get out VIN from shim• remove shim• send to next hop/host
39WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
P0
WUGS
PP
PP
PP
ConfigureResourceSignaling
Discover(switch & ports)
RMCP
flexroutd
zebra
Routing
RA
OSPF
interfaces
P1
P2
P3
P4
P6
P7
P5
src: 192.168.220.5dst: 192.168.200.1sport: 5050dport: 89
data
50515253
IP packet
Port: 5SubPorts: 0-3VIN 20: 192.168.220.1/24:Ext. Link VC 50VIN 21: 192.168.221.1/24:Ext. Link VC 51VIN 22: 192.168.222.1/24:Ext. Link VC 52VIN 23: 192.168.223.1/24:Ext. Link VC 53
Packet ForwardingPort: 0SubPorts: 0-3VIN 0: 192.168.200.1/24:Ext. Link VC 50VIN 1: 192.168.201.1/24:Ext. Link VC 51VIN 2: 192.168.202.1/24:Ext. Link VC 52VIN 3: 192.168.203.1/24:Ext. Link VC 53
40WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Forwarding IP Traffic
P0
WUGS
PP
PP
PP
ConfigureResourceSignaling
Discover(switch & ports)
RMCP
flexroutd
zebra
Routing
RA
OSPF
interfaces
P1
P2
P3
P4
P6
P7
P5
ip lookup
Lookup destination in table:192.168.200.1/24 matches - Out VIN 0
• Driver reads header and performs route lookup, returns fwd_key:– fwd_key = {Stream ID, Out VIN}
– SID = reserved session ID, local only
– VIN = {Port (10 bits), SubPort (6 bits)}
• Insert shim, update AAL5 trailer and IP header.
• calculate internal VC from output VIN’s port number (VC = 40)
src: 192.168.220.5dst: 192.168.200.1sport: 5050dport: 89
data
shim50515253192.168.200.1 -> fwd_key
lookup/out
41WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Sh
imIP
Hea
der
AA
L5
Tra
iler
IP
Dat
agra
m
Fragment offset
Version H-length TOS Total length
Identification Flags
TTL Protocol Header checksum
Source Address
Destination Address
IP Options ??
IP data (transport header and transport data)
AAL5 padding (0 - 40 bytes)
CPCS-UU (0) CPCS-UU (0) Length (IP packet + LLC/SNAP)
CRC (APIC calculates and sets)
Internal IP Packet Format
8 Bytes
42WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
IntraPort Shim: Field Definitions
• Flags - Used by SPC to demultiplex incoming packets.The FPX sets flags to indicate reason for sending packet to SPC. Note, may also use flags to implement flow control.
– AF: Active Flow.– NR: No route in table.– OP: IP Options present (Correct version but incorrect header size).– UK: Unknown packet type (incorrect version for example).
• Stream Identifier (SID): Identifier for reserved traffic, locally unique label. – FPX fills in for reserved flows.
• Input VIN - The physical port and sub-port packet arrived on. PN is the physical port number and SPI is the sub-port identifier. There is a set map from SPI to VCI.
– FPX sets these values. Not clear that we need this in the IntraPort shim.– VCI = Base VC + SPI
• Output VIN - output port and sub-port. – The FPX sets this if the route lookup succeeds. – If the SPC performs the lookup for the FPX then the SPC fills in.– The SPC may also modify this value in order to re-route a packet - modifying seems dangerous, but setting ok.
Output VINInput VIN
Stream IdentifierNot UsedFlags
Virtual Interface Number FormatPN (10 bits) SPI (6 bits)
01531
AF NR OPUK X
Flags
43WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
InterPort Shim: Field Definitions
• Used to convey forwarding information to output port. Currently only the Output SPI is necessary for forwarding.
• Flags: TBD.• Input VIN – Same as IntraPort Shim.
– Ingress FPX or SPC when FPX is not used.
• Output VIN – Same as IntraPort Shim.
X
Output VIN
Not Used
Input VIN
Virtual Interface Number FormatPN (10 bits) SPI (6 bits)
01531Flags
Flags
44WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
FIPL Table Entry Formats
FPX version of FIPL table entry (36 bits):
Output VINStream IdentifierA0163135
TBD15
Output VINStream Identifier01631 15
SPC version of FIPL table entry (32 bits):
Virtual Interface Number Format
PN (10 bits) SPI (6 bits)
015 5
45WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Forwarding IP Traffic
WUGS
ConfigureResourceSignaling
Discover(switch & ports)
RMCP
flexroutd
zebra
Routing
RA
OSPF
interfaces
P3
P4
P6
P7
• At Port 0, driver extract shim and determines the destination VIN
• Output VIN converted to output VC = 50
• Removes shim, updates AAL5 trailer and sends on VC 50 (in this case the packets goes to the CP)
PPP550515253
shim processing
src: 192.168.220.5dst: 192.168.200.1sport: 5050dport: 89
data
RouteAdvertisement
P0PP
PP
P1
P2
src: 192.168.100.5dst: 192.168.214.1sport: 5050dport: 89
data
shim (to VIN 0)
46WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Example: Processing Route Updates
P0
WUGS
PP
PP
ConfigureResourceSignaling
Discover(switch & ports)
RMCP
flexroutd
zebra
Routing
RA
OSPF
Interfaces
P1
P2
P3
P4
P6
P7
• CP kernel delivers packet to socket layer
• Packet read by OSPFd,
• Senders IP address is mapped to interface ID
• OSPFd and Zebra associate received advertisement with an MSR input VIN (VIN20 in this case).
PPP550515253
socket
47WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
P0
WUGS
PP
PP
ConfigureResourceSignaling
Discover(switch & ports)
RMCP
flexroutd
zebra
Routing
RA
OSPF
Interfaces
P1
P2
P3
P4
P6
P7
Note on Sockets, IP and Interfaces
src: 192.168.100.5dst: 192.168.205.1sport: 5050dport: 89
data
• What if packet arrives on interface with different address?
• Do we care?
• Example: Packet sent to CP but arrives at port 7 on different VP.
• CP kernel will still send pkt to socket bound to 192.168.200.1
• If all neighbor routers are directly attached, then it doesn't matter. We can distinguish by looking at sending IP address.
socket
PPP552535455
48WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Example: Processing Route Updates
P0
WUGS
PP
PP
ConfigureResourceSignaling
Discover(switch & ports)
RMCP
flexroutd
zebra
Routing
RA
OSPF
Virtual Interfaces
P1
P2
P3
P4
P6
P7
• OSPFd notifies zebra of any route updates.
• Zebra provides a common framework for interacting with system
• Zebra passes route updates to he MSR Route Manager
PPP552535455
49WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
PPPP
PPPP
PP
Route Distribution and Port Updates
PP
WUGS
PP
192.168.205.1
Add destination vector and output port/VC
"Broadcast" update
P0
PP
ConfigureResourceSignaling
Discover(switch & ports)
RMCP
flexroutd
zebra
Routing
RA
OSPF
Virtual Interfaces
50WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Presentation Overview• Overview of Project and status
• System Architecture
• IP Forwarding in the MSR
• Command Protocol and Debugging Messages
– sendcmd() and the CP to PP Control Path
51WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
MSR Command Overview• The CP may send commands to individual ports (SPCs)
using the command protocol. Use the sendcmd utility.• A specific command type designates the SPC kernel
modules to receive the command message. A sub-command may also be specified to further identify the particular operation to be performed.
• Command Protocol description– Control Processor sends command messages to a specific port
and expects to receive a reply message indicating either Success or Failure. This is termed a Command Cycle.
– There is the notion of a Command Transaction which may include one or more command cycles. A command transaction is terminated when the target (port) responds with a reply message containing an EOF
52WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Predefined Top-Level Commands• policy - Manage MSR policy object (see description)• port_init - set local port number, enable DQ• fipl –Send route updates to port (see the fipmgr utility)• rp_pcu – Send commands to the plugin control unit.• rp_inst - Send message to a plugin instance• rp_class - send message to plugin base class• apic – send commands to the APIC driver• stats – Report port level statistics• cfy – Classifier interface• set_debug/get_debug – Set/Get debug flags and mask. • port_up, port_down, dq and perf – Not Implemented
53WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Policy Object Commands• The policy object is referenced during runtime
– modifications will generally have an immediate effect– get_XXX returns the current value, set_XXX will set it to a
new value.
• get_gen/set_gen - enable/disable general classifier• get_iprt/set_fipl/set_simple - specify which route lookup
algorithm to use: fipl or simple linear lookup.• get_dflags/set_dflags - specify where to send debug
messages: disable, print to local console or send to cp.• get_drr/set_drr - enable/disable drr (Not Implemented)• get_dq/set_dq - enable/disable DQ (Not Implemented)• get_flags/set_flags - return or set all policy control flags.
54WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Plugin Framework Enhancements• Integrated with Command framework
– send command cells to PCU:• create instance, free instance, bind instance to filter,
unbind instance
– Send command cells to particular plugin instances– Send command cells to plugin base class– instance access to: plugin class, instance id, filter id– pcu reports describing any loaded classes, instances
and filters
55WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
MSR RP_PCU Command• rp_pcu Command sub-commands
– addfltr/remfltr - add/remove pkt filter at gate x– flist - port prints current gate x filter list– bind/unbind - bind/unbind instance to fltr/gate combo– create - create plugin instance– free - remove plugin instance– clist - port prints current plugin class list– ilist - port prints current instance list– load/unload - load plugin - not implemented – null - no-op. Can be used as a ping operation
56WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
APIC Command• apic command - this command is useful for
debugging and validating behavior.– info - a verbose printing of low level apic
register, descriptor and buffer state.– resume - can specify a suspended channel to
resume– desc - print information about a particular
descriptor and any associated buffer.– pace - change the per VC pacing parameter– gpace - change the global pacing paramter
57WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Classifier Command• cfy command is currently needed to
perform necessary maintenance– tbl_flush - flush (i.e. free) idle entries in the
classifier table– rt_flush - flush (remove) cached routes in the
classifier table entries.– info - print a list of the active classifier table
entries.
58WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Statisitics Command• stats command - this is a potentially useful
command set but currently is only nominally supported.– get_all - print all statistics using the DEBUG
facility.– reset - reset all counters.
59WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
MSR DEBUG Command• Debug facility - mechanisms for sending text messages
– defines a category and level
– Integrated with Command framework
• Dynamic setting of debug mask - affects what messages are sent based on the category and level masks.
• set_debug or get_debug command– Valid debug categories/modules
• apic, ipfwd, ingress, egress, iprx, iptx, mem, dq, stats, ctl, conf, kern, natm, pcu, plugin, classify, perf, atmrx, atmtx, buff
– Valid debug levels - 0 - 255• predefined: verbose, warning, error, critical
• Interface in MSR kernel– MSR_DEBUG((MSR_DEBUG_<category>|MSR_DEBUG_LEVEL_<level>, “fmt”, args));
60WashingtonWASHINGTON UNIVERSITY IN ST LOUIS
Fred Kuhns - 1/7/02
Example Sending Cmd to Port
CP
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
Next/PrevHop
wugsP0
P1
P2
P3
P4
P5
P6
P7
192.168.200.X
192.168.201.X
192.168.202.X
192.168.203.X
192.168.204.X
192.168.205.X
192.168.206.X
192.168.207.XSPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
SPC/FPX
DQ
DQ
DQ
DQ DQ
DQ
DQ
DQ
192.168.203.2
192.168.202.2
sendcmd();create plugin instance:
port id = 0,PluginID = 200
cmddata
cell hdr
msr_ctl
reply();plugin instance created:
Status,Instance ID
Report command completion status
to application.
Lookup sub-commandperform function call
then report results