CSci8211: SDN Controller Design: ONOS 1 NOS Case Study: ONOS Open Network OS by ON.LAB Prototype 1...
-
Upload
loreen-gibson -
Category
Documents
-
view
219 -
download
1
description
Transcript of CSci8211: SDN Controller Design: ONOS 1 NOS Case Study: ONOS Open Network OS by ON.LAB Prototype 1...
CSci8211: SDN Controller Design: ONOS 1
NOS Case Study: ONOSOpen Network OS by ON.LABPrototype 1
• focus on implementing a global network view • goals: scale-out & fault tolerance
• but not performance• using (generic) open-source platforms
Prototype 2• focus on improving performance, esp. event
latency• use RAMcloud for data store, add a cache layer,
& “customerized” data model
2
SDN Controller Design QuestionsSome Key Questions & Issues: How to obtain global (network-wide) information?How to perform distributed state management?
time scales of state change dynamics? consistency issues? What are the configurations? Abstractions & APIs?How to implement such a Network OS?
And will it really work? E.g., response time & other performance issues?How to program control apps? E.g., a SDN programming language?Will it scale?
Not only in terms of network size, but also # flows, control apps, etc.? What about reliability & security issues?… (e.g., inter-operability, evolvability)Are there some fundamental design principles we can adopt & apply?
CSci8211: SDN Controller Design: ONOS
3
SDN Controller Design How to design a Network Operating System? What features or “abstractions” should be provided by this “Network Operating System”?In particular, what should be the “global network view” & “programmatic interfaces” provided to control apps?
or what “low-level” details should be handled by Network OS?And what is the granularity of control allowed to “apps”? Analogies (& possible differences?): computer OS and (high-level) programming models
computer architecture: instruction sets, CPU, memory, disks, I/O devices, ... (high-level) programming language constructs: statements, data types, functions, … OS: (virtual) memory, processes, I/O and drivers, system calls, …
(distributed) file systems (or databases or data stores) files, directories & permissions, transactions, relations & schemas; vs. disks, ….
CSci8211: SDN Controller Design: ONOS
NOS Requirements for WANs
4CSci8211: SDN Controller Design: ONOS
Prior Work
Distributed control platform for large-scale networks
ONIX: closed source; datacenter + virtualization focus
ONOS design influenced by ONIX
Distributed:ONIX
Single Instance
NOX, POX, Beacon, Floodlight, Trema controllers
Helios, Midonet, Hyperflow, Maestro, Kandoo, …
Community needs an open source distributed network OS
CSci8211: SDN Controller Design: ONOS
Routing TE
PacketForwarding Packet
Forwarding
PacketForwarding
Mobility
ProgrammableBase Station
Openflow Scale-outDesign
Fault Tolerance
Global network view
ONOS: Open Network OS
Global Network View
CSci8211: SDN Controller Design: ONOS
ONOS Scale-Out
Distributed Network OS Instance 2 Instance 3
Instance 1
Network GraphGlobal network view
An instance is responsible for maintaining a part of network graph
Control capacity can grow with network size or application need
Data plane
CSci8211: SDN Controller Design: ONOS
Master Switch A = ONOS
1Candidates =
ONOS 2, ONOS 3
Master Switch A =
ONOS 1Candidates = ONOS 2, ONOS
3
Master Switch A =
ONOS 1Candidates =
ONOS 2, ONOS 3
ONOS Control Plane Failover
Distributed Network OS
Instance 2 Instance 3Instance 1
Distributed Registry
Host
Host
Host
A
B
C
D
E
F
Master Switch A =
NONE Candidates =
ONOS 2, ONOS 3
Master Switch A =
NONECandidates =
ONOS 2, ONOS 3
Master Switch A = NONE
Candidates = ONOS 2, ONOS 3
Master Switch A = ONOS
2 Candidates =
ONOS 3
Master Switch A =
ONOS 2 Candidates =
ONOS 3
Master Switch A = ONOS
2 Candidates =
ONOS 3
CSci8211: SDN Controller Design: ONOS
Host
Host
Host
Network Graph Database (Titan)
Instance 1 Instance 2 Instance 3
Distributed Registrystrongly Consistent Zookeeper
OpenFlow Manager(Floodlight)
High Level Architecture: Prototype I
+Floodlight Drivers
Scale-out
Coordination
Control Application Control Application Applications
OpenFlow Manager(Floodlight)
OpenFlow Manager(Floodlight)
Global NetworkView
(Distributed Network State)
Distributed Key-Value Store(Cassandra)
eventualconsistency
Blueprint API
transactionalconsistency
CSci8211: SDN Controller Design: ONOS
Network Graph
CSci8211: SDN Controller Design: ONOS
Cassandra In-memory DHT
Id: 1A
Id: 101, Label
Id: 103, Label
Id: 2C
Id: 3B
Id: 102, Label
Id: 104, Label
Id: 106, Label
Id: 105, Label
Network Graph
Titan Graph DB
ONOS Network Graph Abstraction
CSci8211: SDN Controller Design: ONOS
Network Graphport
switch port
device
port
onportportport
link switchon
device
host host
Network state is naturally represented as a graph Graph has basic network objects like switch, port, device and links Application writes to this graph & programs the data plane
CSci8211: SDN Controller Design: ONOS
Example: Path Computation App on Network Graph
portswitch port
device
Flow pathFlow entry
porton
portportport
link switch
inport
on
Flow entry
device
outportswitchswitch
host host
flowflow
• Application computes path by traversing the links from source to destination• Application writes each flow entry for the pathThus path computation app does not need to worry about topology maintenance
CSci8211: SDN Controller Design: ONOS
Network Graph Representation
Flow path
Flow entry
Flow entry
flow
flow
Vertex with 10 properties
Vertex with 11 properties
Vertex represented as Cassandra row
Edge represented as
Cassandra column
Column ValueLabel id + direction
Primary key
Edge id Vertex id Signature properties
Other properties
Switch
Vertex with 3 properties
Row indices for fast vertex centric queries
CSci8211: SDN Controller Design: ONOS
Switch Manager Switch ManagerSwitch Manager
Network Graph: Switches
OF
Network Graph and Switches
OF OF OF
OF OF
CSci8211: SDN Controller Design: ONOS
SM
Network Graph: Links
SM SM
Link Discovery Link Discovery Link Discovery
LLDP LLDP
Network Graph and Link Discovery
CSci8211: SDN Controller Design: ONOS
Network Graph: Devices
SM SM SMLD LD LD
Device Manager Device Manager Device Manager
PKTIN
PKTIN
PKTINHost
Host
Host
Devices and Network Graph
CSci8211: SDN Controller Design: ONOS
SM SM SMLD LD LD
Host
Host
Host
DM DM DM
Path Computation Path Computation Path Computation
Network Graph: Flow Paths
Flow 1Flow 4Flow 7
Flow 2Flow 5
Flow 3Flow 6Flow 8
Flow entriesFlow entriesFlow entries
Flow entriesFlow entriesFlow entries
Flow entriesFlow entriesFlow entries
Flow entriesFlow entriesFlow entries
Flow entriesFlow entriesFlow entries
Flow entriesFlow entriesFlow entries
Flow entriesFlow entriesFlow entries
Flow entriesFlow entriesFlow entries
Path Computation with Network Graph
CSci8211: SDN Controller Design: ONOS
SM SM SMLD LD LD
Host
Host
Host
DM DM DM
Flow Manager
Network Graph: FlowsPC PC PC
Flow Manager Flow ManagerFlowmod FlowmodFlowmod
Flow 1Flow 4Flow 7
Flow 2Flow 5
Flow 3Flow 6Flow 8
Flow entriesFlow entriesFlow entries
Flow entriesFlow entriesFlow entries
Flow entriesFlow entriesFlow entries
Flow entriesFlow entriesFlow entries
Flow entriesFlow entriesFlow entries
Flow entriesFlow entriesFlow entries
Flow entriesFlow entriesFlow entries
Flow entriesFlow entriesFlow entries
Network Graph and Flow Manager
CSci8211: SDN Controller Design: ONOS
Example: A simpler abstraction on network graph?
Logical Crossbar
portswitch port
device
Edge Port
porton
portportport
link switch
physical
on
Edge Port
device
physical
host host
• App or service on top of ONOS• Maintains mapping from simpler to complex Thus makes applications even simpler and enables new abstractions
Virtual network objects
Real network objects
More on Network Graph Later
CSci8211: SDN Controller Design: ONOS
Evaluation of Prototype I Evaluation Setting
ONOS cluster controls 100s of virtual switches, programming end-to-end flows using the network view
dynamically adding switches & ONOS instances to the cluster failovers in response to ONOS instance shut-downs rerouting in response to link failures
Scalability & High Availability (HA)• ONOS scales with more controller instances
though need low latency distributed data store Consistency and Integrity
• Titan maintains graph’s structural integrity on top of Cassandra’s eventual consistent data store
can benefit from some degree of sequencing for deterministic state transitions
CSci8211: SDN Controller Design: ONOS
Evaluation of Prototype I … Low Performance and Visibility
latency for event handling much worse than expected e.g., reacting to link failure could take up to 30 seconds
diagnosing performance problems hard due to large, complex code base of generic open-source code base
Data Model Issues & Excessive Data Store Operations• Titan => model all data objects (e.g., ports, flow entries) as vertices
requires indexing vertices by type: index maintenance a bottleneck maintaining & storing references between many small objects leads to
dozens of graph database update operations• Mapping Titan to Cassandra => excessive data store operations
shared table & index: unnecessary contention among “independent” ops
Polling for detecting changes in network states • high CPU load, increasing delay to events & info. exchange
22CSci8211: SDN Controller Design: ONOS
CSci8211: SDN Controller Design: ONOS 23
ONOS Prototype II Lessons learned from Prototype 1
more efficient data models to reduce # of data store operations fast notification & messaging across ONOS instances also, Simplified network view APIs
Prototype 2: focus on improving performance, esp. event latency RAMcloud for data store: Blueprints APIs directly on top
• RAMcloud: low-latency, dist. key-value store w/ 15-30 us read/write Optimized data model
table for each type of net. objects (switches, links, flows, …) minimize # of references bw elements: most updates require 1 r/w
(In-memory) topology cache at each ONOS instance eventual consistent: updates may receive at diff. times & orders atomicity under schema integrity: no r/w by apps midst an update
Event notifications using Hazelcast, a pub-sub system
Host
Host
Host
ONOS Graph Abstraction
Instance 1 Instance 2 Instance 3
Network Graph
Distributed RegistryStrongly Consistent Zookeeper
OpenFlow Manager(Floodlight)
High Level Architecture: Prototype II
+Floodlight Drivers
Scale-out
Coordination
Control Application Control Application Applications
OpenFlow Manager(Floodlight)
OpenFlow Manager(Floodlight)
Global NetworkView
(Distributed Network State)
Distributed Key-Value Store(RAMCloud)
Eventualconsistency
Network View API
Event Notification(Hazelcast)
CSci8211: SDN Controller Design: ONOS
CSci8211: SDN Controller Design: ONOS 25
ONOS Graph Representation and Topology Cache
Network Topology State and Graph Representation ONOS keeps track of info about infrastructure, making it available
to control apps both protocol-agnostic & protocol-specific network elements & state
representations; can be translated from one to the other Model objects (prtcl-agnostic) and providers (prtcl-specific)
• model object dependencies: device first-class entity A table for each type of objects; a distributed store
switch, host, port, link, edgelink, path, flow entry, …
Topology Cache Each instance keeps an in-memory cache of an entire
network topology (i.e.., not only part of the store under its mastership)
apply updates atomically to maintain integrity
Network State• Topology(Switch, Port, Link, …)• Network Events(Link down, Packet In, …)• Flow state(Flow table, connectivity paths, ...)
Applications
program
observeApplicationsApplications Applications
Switch
Port
Link
Host
Intent
FlowPath
FlowEntry
ONOS Global Network View
CSci8211: SDN Controller Design: ONOS
“Tell me about your slice?”
Cache
ONOS Topology Cache
CSci8211: SDN Controller Design: ONOS
Evaluation of Prototype II Basic Network State Changes Evaluation Setting
• 3-node ONOS cluster & a WAN of 81 OF switches w/ avg. 4 ports
RAMcloud with generic graph data model: 1 r + 8w to add a sw
RAMcloud with the new data model: 1 write to add a sw & port
28
ONOS cluster w/ 10Gb/s Eth. Conn.
w/ Kryo w/ Google Protocol Buffers
CSci8211: SDN Controller Design: ONOS
Evaluation of Prototype II … Handling Network Events Evaluation Setting
• 6-node ONOS cluster; Mininet net. of 206 soft sw.’s, 416 links• 16,000 flows, one interface fails: reroute 1000 flows, 6->7 hops
29CSci8211: SDN Controller Design: ONOS
Evaluation of Prototype II … Path Installation Evaluation Setting
• 6-node ONOS cluster; Mininet net. of 206 soft sw.’s, 416 links• 15,000 flows preinstalled; add 1000 6-hop flows
30
Latency (table 3) and Throughput (derived from table 3)• median throughout: 18,832 paths/sec
CSci8211: SDN Controller Design: ONOS
ONOS Demo on Internet26 node ONOS cluster, Mininet topology,
1,000 affected flows, 6 hop path
Reaction Time:45.2 ms (median)75.8 ms (99th percentile)
Total Time to Reroute:71.2 ms (median)116 ms (99th percentile)
CSci8211: SDN Controller Design: ONOS
ONOS Summary Control isolation (sharding)
Divide network into parts and control them exclusively Load balancing -> we can do more
Distributed data store That scales with controller nodes with HA -> though we
need low latency distributed data store Dynamic controller assignment to parts of network
Dynamically assign which part of network is controlled by which controller instance -> we can do better with sophisticated algorithms
Graph abstraction of network state Easy to visualize and correlate with topology Enables several standard graph algorithms
32CSci8211: SDN Controller Design: ONOS