OpenContrail Cloudwatt Feedback
-
Upload
ethuleau -
Category
Engineering
-
view
218 -
download
0
Transcript of OpenContrail Cloudwatt Feedback
OpenContrail deployment experience
at Cloudwatt
About me
● Network engineer since 2006● Working on OpenStack since the beginning
2010● Working on OpenContrail since a year as a
developer and integrator
Cloudwatt IaaS
● French public cloud provider● 3 years experience with OpenStack● 1 year experience with OpenContrail
○ 1 data center ■ 200 compute nodes■ 3 peta of raw swift storage
○ OpenStack IceHouse release
Contrail in Cloudwatt
● Started with Contrail release 1.06 in June 2014
● Run onto a Cisco Nexus fabricpath● Terminate l2vpn tunnel with two Juniper MX
Contrail in Cloudwatt
Contrail logical view
Config
Neutron API
Analytics
Control
IF-MAP
vrouter vrouter vrouter
Contrail in Cloudwatt
● 2 Neutron API: neutron server with Contrail plugin
● 2 config nodes: discovery, API, SVC monitor, schema, IF-MAP server
● 2 control nodes● 2 analytics nodes● 2 webUI nodes
Contrail in Cloudwatt
Config Config
Neutron API Neutron API
Analytics Analytics
Control Control
vrouter vrouter vrouter
IF-MAPIF-MAP
WebUIWebUI
XMPP
Contrail in Cloudwatt
● Load balancing front of APIs and WebUI● 2 Cassandra clusters of 3 nodes each● RabbitMQ cluster of 2 nodes● Cluster Zookeeper compose of 3 nodes
Contrail in Cloudwatt
Config Config
Neutron API Neutron API
Analytics Analytics
Control Control
vrouter vrouter vrouter
IF-MAP
XMPP
Cassandra
Cassandra
AMQP + ZK
IF-MAP
WebUIWebUI
Issue on 1.06
● Difficulty to operate it and upgrade/maintain it without down time
● Stabilize/compatibility Neutron to Contrail translator API
● Analytics does not work● Some memories leak on the compute node
Upgrade to 1.10
● After nine month with 1.06● New version to fix issues and bring new
features (SNAT/LBaaS)● Following the upstream
Upgrade to 1.10Create a tool to monitor the contrail cluster status
Upgrade to 1.10
We deviced to do it in 2 steps:1. Control plane (in a night)
○ Config (slave schema before)○ Control○ Analytics ○ WebUI○ Neutron API
Upgrade to 1.10
2. Data plane (during few days)○ upgrade/bootstrap spare compute node in 1.10 and
add them in the available compute pools○ remove all running 1.06 compute nodes to the
available pool○ let a time slot to clients on that 1.06 nodes to move
their VM before upgrade that node to 1.10 (no live migration)
○ then open champagne bottles!
Bug met during the upgrade● vrouter 1.06 cannot live with 1.10 with MPLSoUDP
encapsulation => pass to MPLSoGRE during the cohabitation
● SNAT/LBaaS stuff does not take care of the vrouter version
● Slow all the contrail API due to the move of the Neutron Contrail plugin code from neutron-server to Contrail API
● Zookeeper timeout
Bug met after upgrade
● Data kernel module path memory leak● Data kernel module path hold flows count
leak (workaround: restart the vrouter agent)● 13 Cloudwatt patches added to the 1.10
upstream release:https://review.opencontrail.org/#/q/status:open+branch:R1.10,n,z
Bug still persist on 1.10
● Schema slave->master ~20 mins● Logging stuff configuration● Some 5xx error still appears on the Contrail
API● Live upgrade a compute node without
downtime (do we need it?)
My wishlist to Santa SDN
● That people use more https://blueprints.launchpad.net/opencontrail
● Stable master before pulling new branch● Use http://semver.org to number releases● The Contrail team to be more community
oriented
2015S2 todo● Improve Neutron Contrail plugin code https://review.opencontrail.org/10123● Upgrade to 2.x branch● Build a CI/CD on master
○ build and deploy daily○ run opencontrail sanity○ run functional no-reg○ run performance no-reg
● OpenStack L3VPN integration
Questions ?