When DevOps and Networking Intersect by Brent Salisbury of socketplane.io

Post on 27-Jun-2015

367 views 4 download

Tags:

description

When DevOps and Networking Intersect by Brent Salisbury of socketplane.io

Transcript of When DevOps and Networking Intersect by Brent Salisbury of socketplane.io

when network and devops intersect

Brent Salisbury socketplane.io

socketplane.io - docker networking

John Willis Co-Founder & VP Business Development Formerly: Formerly CTO Stateless Networks

Madhu Venugopal Co-Founder & President Formerly: Principal Engineer Office of the CTO, Red Hat

Brent Salisbury Co-Founder & VP Engineering Formerly: Senior Engineer Office of the CTO, Red Hat

Dave Tucker Co-Founder, VP Product Formerly: Senior Engineer Office of the CTO, Red Hat

lessons_learned struct

1. the evolving network!2. lessons learned from controller development!3. netops from an operational+dev view!4. looking ahead

the problem

Cos

tNetwork

Compute - Storage

VerticalIntegration

Horizontal Scale

Number Widgets - Economies of Scale

Network Capacity

Needs

Over Provisioned

Netw

ork

Usag

e G

rowt

h

Time

Under Provisioned

Network Capacity

Needs

Net

wor

k U

sage

Gro

wth

Time

Efficient Provisioning

Where we were

• CLI for everything • vendor management tools did everything and nothing. • used to be Perl, TCL and later Python • zero ip management !• turned into a who can make the best obscure magic !

Where we are

• CLI for everything • vendor management tools did everything and nothing. • used to be Perl, TCL and later Python • zero ip management !

• turned into a who can make the best obscure magic !

where we are(ish)

• exponential growth with flat operating budgets!• incessant pressure for uptime + capex/opex cost

reduction!• the majority of networks still maintain proprietary hw,

sw and api!• datapaths are still barely programmable !• netops manages very little beyond the ToR.

quick review of node distribution

• distributed!• centralized!• de-centralized

Centralized

Centralized

Forwarding Population

Controller

Match + Action

the sdn approach

Decentralized

Decentralized

Topology

Forwarding Population + Clustered Controller

Orchestration

Match + Action

the sdn approach

similarly both hard problemsRouting Engine

Line Card 1

P...P1 P2

MAC Source Addres

s

MAC Destinati

on

IP Source Address

IP Destinati

on

Source

Port

Destination Port Instructions

Ingress

Port

Priority

* * * * * *

GOTO/Drop/

Controller/Normal

*.0

Protocol

*

Data Plane

Line Card 2

P...P1 P2

MAC Source Addres

s

MAC Destinati

on

IP Source Address

IP Destinati

on

Source

Port

Destination Port Instructions

Ingress

Port

Priority

* * * * * *

GOTO/Drop/

Controller/Normal

*.0

Protocol

*

Data Plane

Line Card ...

P...P1 P2

MAC Source Addres

s

MAC Destinati

on

IP Source Address

IP Destinati

on

Source

Port

Destination Port Instructions

Ingress

Port

Priority

* * * * * *

GOTO/Drop/

Controller/Normal

*.0

Protocol

*

Data Plane

Controller

OVS

P...P1 P2

MAC Source Addres

s

MAC Destinati

on

IP Source Address

IP Destinati

on

Source

Port

Destination Port Instructions

Ingress

Port

Priority

* * * * * *

GOTO/Drop/

Controller/Normal

*.0

Protocol

*

Data Plane

OF Switch

P...P1 P2

MAC Source Addres

s

MAC Destinati

on

IP Source Address

IP Destinati

on

Source

Port

Destination Port Instructions

Ingress

Port

Priority

* * * * * *

GOTO/Drop/

Controller/Normal

*.0

Protocol

*

Data Plane

Random Agent

P...P1 P2

MAC Source Addres

s

MAC Destinati

on

IP Source Address

IP Destinati

on

Source

Port

Destination Port Instructions

Ingress

Port

Priority

* * * * * *

GOTO/Drop/

Controller/Normal

*.0

Protocol

*

Data Plane

Bus EthernetFabric

Distributed

Distributedthe internets scales

Host 1

L2 Flooding and Learning

Host 2

Data PlaneData PlaneFlooding Flooding

VLAN xVLAN x

!• Live workload migration cripples network ops!• subnets for policy groupings are the only reason to think

in those terms anymore

the barrier to scale

shit that doesn't scale• the next few slides are

things i thought were possible at some point around the problem of L2!!

• lesson learned prototype and fail faster!!

• ask your team why they really need L2

Host 1

OpenFlow Controller

Proactive L2 Flooding and Learning with Legacy VLANs

Host 2

Data PlaneData Plane

Maintaining Legacy Broadcast Domains Controller Never Punts ARP

Flooding FloodingVLAN x

VLAN x

Proactive Rule - Match: ARP Action: Normal

Can Also Serve as a Fallback Failure Mode or Hybrid Mirgration Strategy

OpenFlow Switch

Data PlaneP3P1 P2

Svr 2 Svr 3Svr 1

OpenFlow Controller

MAC Source Addres

s

MAC Destinati

on

IP Source Address

IP Destinati

on

Source

Port

Destination Port Instructions

Ingress

Port

Priority

* * * * * *

GOTO/Drop/

Controller/Normal

*.0

Protocol

*

Packet-In A Flowmod Installs a Flow Rule for Subsequent Matching Packets

Reactive OpenFlow Flow Policy1st Packet in Flow

Controller Intercepting ARP and Proxy the Reply

Host 2

Data PlaneData Plane

VLAN ID Constraints Becomes IrrelevantTenancy Maintained in the Controller

Switch 1 Switch 2

Host 1

Match: ARP Action: Controller Match: ARP Action: Controller

ARP Requestand Reply

Controllers can Answers and/or Sends ARP (proxy)

OpenFlow Controller

Host 2 IP, MAC,Tenant ==> Tunnel 200 Tep IP Host (Key) Location (Value)

Controller Connect Source and Destination Hosts via Packet-In and Flowmods

Host 2

Data PlaneData PlaneSwitch 1 Switch 2

Host 1

Match: ARP Action: Controller Match: ARP Action: Controller

ARP Request

Data Path (Tunnel, or Flow Path

FlowmodBuilding Data Path

OpenFlow Controller

Host 2 IP, MAC,Tenant ==> Tunnel 200 Tep IP Host (Key) Location (Value)

FlowmodBuilding Data Path

VLAN ID Constraints Becomes IrrelevantTenancy Maintained in the Controller

not if but when

!• build infrastructure for the worst

case scenario, because it will be worse.!

• cascading failure suck!• focus on solving the problem

not the implementation!• intelligence in the datapath HW

is a good thing as long ideally if coupled with open and programmatically manageable

P3P1 P2

DPID DPIDDPID

Control Plane

Control and Data Plane Split Brain

?

?

???

Data Plane - DPID ::00:01

X

Linux Bridging

BridgeFrame In

IPTables

Frame Egress

HAProxy Functions X,Y, Z

this movie has a shitty ending

What Works: Performance and Reliability First

OVS/DPDK Packet Forwarding Pipeline

Classifier

Table 0Frame In

FunctionFoo

Table 2

FunctionBar

Frame Out…….. Table n

Stages

Data CenterL3 Core

Data CenterL3 Core

PhysicalSwitch vSwitch Physical

Switch

vSwitch PhysicalSwitch vSwitch

Firewall

North/South Security Policy

Data Center Today

traffic alignment from the 90’s

Data CenterL3 Core

Data CenterL3 Core

PhysicalSwitch vSwitch Physical

Switch

vSwitch PhysicalSwitch vSwitch

East West Security Policy

Distributed Policy Application For Data Center

new architectures for new workloads

trust what you know• rely your own operational experiences, if you don't have any go

get some even if its stalking customers!• don't fall in love with implementations, they are probably wrong!• ask questions but be open minded!• avoid slide jockeys!• avoid the vendor wars!• avoid cults!• complexity w/o abstraction fails!• almost all abstractions fail

serenity now, insanity later

• make time for research and planning!!• wether it is a big infra project or an dev sprint, don't

let the oppressive demand of execution compromise a practical design!!

• that said, if the plan sucks, change it.

nothing is easy, don't make it harder

• prototyping and early feedback should be your compass

• when users says, this seems a little too complex, LISTEN!

• odds are you aren't going to be able to get the right abstraction to hide your over-engineering

performance and reliability first

• network operators are measured in uptime first • don't compromise reliability for cost savings without

making it very clear to all leadership, not just the IT manager heroes.

• perform consistency checking

/dev• understand the problem first!!

• if you don't understand the problem stalk someone who does!!

• make readable code!!

• code for the worst case scenario

architecture• if it isn't broke, don't break it • architects need understandable components • architects need predictable components • predictive analysis is a big data problem • predict problems with operational tools and data • don't build a nuclear submarine when a bicycle will do

test and prototype!

• verify before you hit enter!• automate all production changes!• setup rollback processes!!

• the result:!• should be shorter change windows!• faster rollbacks!• better trained operators

everybody is smart

• "A great team doesn’t mean that they had the smartest people. What made those teams great is that everyone trusted one another. It can be a powerful thing when that magic dynamic exists." -Gene Kim

team culture

• not proving how much smarter you are then your co-workers.

• give credit to the team first, its just weird otherwise

• don't hoard contacts • find peoples passion and

maximize it • protect your cultures morale like it

is your bank account

• starting out!• no one can learn for you, find your passion!• learn linux!• explore vswitches, I recommend http://openvswitch.org!• connect with peers in the community and share experiences

where to start?

• explore compute (containers, hypervisors and everything else beyond the top of rack!!

• further along!• code, i recommend Golang atm fwiw!• learn CI tools and sw dev processes!• contributes to upstream open source!• build something that solves others

problems and open source it