Peter Thomson - The Best-Kept Secrets of Great Communicators
ganeti's best kept secrets, and exciting new developments
Transcript of ganeti's best kept secrets, and exciting new developments
Ganeti, the New and Arcaneganeti's best kept secrets, and exciting new developments
Ganeti Eng Team - GoogleLinuxCon Japan 2014 - 2 Feb 2014
Introduction to GanetiA cluster virtualization manager, in one slide
What is Ganeti?
Manage clusters 1-200 of physical machines, divided in nodegroupsDeploy Xen/KVM/LXC virtual machines on them
Controlled via command line, REST, web interfaces
·
·
Live migrationResiliency to failure (DRBD, Ceph, SAN/NAS, ...)Cluster balancingEase of repairs and hardware swaps
-
-
-
-
·
4/53
Newest featuresDevelopment status
2.10The very stable release
Improved upgrade procedure "gnt-cluster upgrade"CPU Load in hail/hbal (GSOC project)Hotplug support (KVM)RBD storage direct access (KVM)Better Openvswitch support (GSOC project)
·
·
·
·
·
6/53
2.11The latest stable release
Faster instance movesGlusterFS supporthsqueeze (achieve maximum cluster compaction)
·
·
·
7/53
2.12 and futureThe next stable release(s)
Jobs as processesNew install modelMore secure master candidatesBetter container support (GSOC)Resource reservation/Extra parallelizationGeneric conversion between disk templates (GSOC)
·
·
·
·
·
·
8/53
Monitoring daemonWhat's going on in your cluster?
Monitoring a clusterThe old school way
Cluster
Master
Node
Instance
Storage
NICsMonitoring
System
OtherSystems
10/53
Monitoring a clusterUsing the monitoring daemon
ClusterMonitoring
System
OtherSystems
Monitoring Daemons
11/53
What is the monitoring daemon?
Provides information:
design doc: design-monitoring-agent.rst
about the cluster state/healthliveread-only
·
·
·
12/53
More details
HTTP daemonReplying to REST-like queries
Providing JSON replies
Running on every node (Not: only master-candidates, VM-enabled)Additionally: mon-collector: quick 'n dirty CLI tool
·
·
Actually, GET only·
·
Easy to parse in any languageAlready used in all the rest of Ganeti
·
·
·
·
13/53
Data collectors
provide data to the deamonone collector, one reportone collector, one category:
two kinds: performance reporting, status reportingnew feature: stateful data collectors
·
·
·
storage, hypervisor, daemon, instance-
·
·
14/53
Data collectorsWhat data can be retrieved right now?
Now:
Soon(-ish):
instance status (Xen only) (category: instance)diskstats information (storage)LVM logical volumes information (storage)DRBD status information (storage)Node OS CPU load average (no category, default)
·
·
·
·
·
instance status for KVM (instance)Ganeti daemons status (daemon)Hypervisor resources (hypervisor)Node OS resources report (default)
·
·
·
·
15/53
The report format
{ "name" : "TheCollectorIdentifier", "version" : "1.2", "format_version" : 1, "timestamp" : 1351607182000000000, "category" : null, "kind" : 0, "data" : { "plugin_specific_data" : "go_here" }}
JSON
name: the name of the plugin. Unique string.version: the version of the plugin. A string.format_version: the version of the data format of the plugin. Incrementalinteger.timestamp: when the report was produced. Nanoseconds. Can be zero-padded.
·
·
·
·
16/53
Status reporting collectors: report
They introduce a mandatory part inside the data section.
"data" : { ... "status" : { "code" : <value> "message: "some summary goes here" }}
JSON
<value>:by increasing criticality level·
0: working as intended1: temporarily wrong. Being auto-repaired2: unknown. Potentially dangerous state4: problems. External intervention required
·
·
·
·
17/53
How to use the daemon?
Accepts HTTP connections on node.example.com:1815
GET requests to specific addressesEach address returns different info according to the API
·
Not authenticated: read onlyJust firewall, or bind on local address only
·
·
·
·
/ (return the list of supported protocol version)/1/list/collectors/1/report/all/1/report/[category]/[collector_name]
18/53
Configuration Daemon (confd)How's your cluster supposed to look like?
Before confd
Configuration only available on master candidatesFew selected values replicated with ssconf
Need for a way to access config from other nodes
·
·
Small pieces of config in text files on all the nodesDoesn't scale
·
·
·
ScalableNo single point of failure (so, no RAPI)
·
·
20/53
What does confd do?
Provides information from config.dataRead-onlyDistributed
Optional
·
·
·
Multiple daemons running on master candidatesAccessible from all the nodes through confd protocolResilient to failures
·
·
·
·
21/53
What info does it provide?
Replies to simple queries:
PingMaster IPNode roleNode primary IPMaster candidates primary IPsInstance IPsNode primary IP from Instance primary IPNode DRBD minorsNode instances
·
·
·
·
·
·
·
·
·
22/53
confd protocolGeneral description
UDP (port 1814)keyed-Hash Message Authentication Code (HMAC) authentication
Timestamp
Queries made to any subset of master candidatesTimeoutMaximum number of expected replies
·
·
Pre-shared, cluster wide keyGenerated at cluster-initRoot-only readable
·
·
·
·
Checked (± 2.5 mins) to prevent replay attacksUsed as HMAC salt
·
·
·
·
·
23/53
Confd protocolRequest/Reply
request request request request request
24/53
Confd protocolRequest/Reply
reply (v: 56)
timeout
reply (v: 57)(enough replies)
reply (v: 57) reply (v: 57)
25/53
confd protocolRequest
plj0{ "msg": "{\"type\": 1, \"rsalt\": \"9aa6ce92-8336-11de-af38-001d093e835f\", \"protocol\": 1, \"query\": \"node1.example.com\"}\n", "salt": "1249637704", "hmac": "4a4139b2c3c5921f7e439469a0a45ad200aead0f"}
CONFD
plj0: fourcc detailing the message content (PLain Json 0)hmac: HMAC signature of salt+msg with the cluster hmac key
·
·
26/53
confd protocolRequest
plj0{ "msg": "{\"type\": 1, \"rsalt\": \"9aa6ce92-8336-11de-af38-001d093e835f\", \"protocol\": 1, \"query\": \"node1.example.com\"}\n", "salt": "1249637704", "hmac": "4a4139b2c3c5921f7e439469a0a45ad200aead0f"}
CONFD
msg: JSON-encoded query·
protocol: confd protocol version (=1)type: What to ask for (CONFD_REQ_* constants)query: additional parametersrsalt: response salt == UUID identifying the request
·
·
·
·
27/53
confd protocolReply
plj0{ "msg": "{\"status\": 0, \"answer\": 0, \"serial\": 42, \"protocol\": 1}\n", "salt": "9aa6ce92-8336-11de-af38-001d093e835f", "hmac": "aaeccc0dff9328fdf7967cb600b6a80a6a9332af"}
CONFD
salt: the rsalt of the queryhmac: hmac signature of salt+msg
·
·
28/53
confd protocolReply
plj0{ "msg": "{\"status\": 0, \"answer\": 0, \"serial\": 42, \"protocol\": 1}\n", "salt": "9aa6ce92-8336-11de-af38-001d093e835f", "hmac": "aaeccc0dff9328fdf7967cb600b6a80a6a9332af"}
CONFD
msg: JSON-encoded answer·
protocol: protocol version (=1)status: 0=ok; 1=erroranswer: query-specific replyserial: version of config.data
·
·
·
·
29/53
Ready-made clients
The protocol is simple, but clients are simpler
Ready to use confd clients·
Python
Haskell
·
lib/confd/client.py·
·
Since Ganeti 2.7src/Ganeti/ConfD/Client.hssrc/Ganeti/ConfD/ClientFunctions.hs
·
·
·
30/53
Expanding confd capabilities
Currently not so many queries are supportedEasy to add new ones
·
·
Just add a new query type in the constants list...and extend the buildResponse function(src/Ganeti/Confd/Server.hs to reply to it in the appropriate way
·
·
31/53
Ganeti and NetworksHow do your instances talk to the world?
Some slides contributed by Dimitris Aragiorgis <[email protected]>·
current nics: MAC + IP + link + mode
NIC configuration
Management
mode=bridged uses brctl addifHooks can deal with firewall rules, and moreExternal systems needed for DHCP, IPv6, etc.
·
·
·
Which VMs are on the same collision domain?Which IP is free for a new VM to use?
·
·
33/53
gnt-network overview
manage collision domains for your instanceseasy way to assign IPs to instances
keep existing per-nic flexibilityhide underlying infrastructurebetter networking overview
·
·
If resources are shared in multiple clusters, allocation must be doneexternally
-
·
·
·
34/53
gnt-network: Who does what?
masterd: config.data integrity
external scripts and hooks: ping vm1.ganeti.example.com
abstract network infrastructure: network + netparams per nodegroupIP uniqueness inside network: IP pool management
encapsulate network information in NIC opjects: RPC
·
·
bitarray, TemporaryReservationmanager, Locking-
·
use exported environment provided by nodedbrctl, iptables, ebtables, ip rule, etc.update external dhcp/DNS server entrieslet VM act unaware of the "situation" (dhclient, etc.)
·
·
·
·
35/53
gnt-network + external scripts
gnt-network alone is nothing more than a nice config.datasnf-network: node level scripts and hooksnfdhcpd: node level DHCP server based on NFQUEUE
·
·
·
36/53
snf-networknode level scripts and hooks
overrides Ganeti default scripts (kvm-ifup, vif-ganeti)looks for specific tag types in NIC's networkapplies corresponding rulescreated nfdhcpd binding filesprovides hook to update DNS entries
·
·
·
·
·
37/53
nfdhcpdnode level DHCP server based on NFQUEUE
listens on specific NFQEUEupdates its leases db
mangles DHCP requests and replies based on it's dbresponds to RS and NS for IPv6 auto-configuration
·
·
inotify on specific directory for binding files-
·
·
38/53
gnt-networkExamples
Create and connect a new network
Create an instance inside this network
gnt-network add --network 192.168.1.0/24 --gateway 192.168.1.1 --tags nfdhcpd net1gnt-network connect net1 bridged prv0
gnt-instance add --net 0:ip=pool,network=net1 ... inst1gnt-instance info inst1gnt-network info net1
39/53
gnt-network + snf-*Examples
Use snf-network and nfdhcpd
Test connectivity
apt-get install snf-network nfdhcdpdiptables -t mangle -A PREROUTING -i prv+ -p udp -m udp --dport 67 \ -j NFQUEUE --queue-num 42ip addr add 192.168.1.1/24 dev prv0
gnt-instance reboot inst1ping 192.168.1.2
40/53
References
snf-network: http://code.grnet.gr/git/snf-networknfdhcpd: http://code.grnet.gr/git/snf-nfdhcpd
·
·
41/53
Ganeti ExtStorage InterfaceMore options for your data
Some slides contributed by Constantinos Venetsanopoulos <[email protected]>·
State before the ExtStorage Interface
Non-mirrored templates: plain, fileInternally mirrored templates: drbdExternally mirrored templates: sharedfile, rbd, blockdev, diskless
·
·
·
43/53
Ganeti and external SAN/NAS applicances
Instance disks residing inside an external SAN/NAS appliance visible by allGaneti nodes (e.g. NetApp, EMC, IBM)
Instances should be able to migrate/failover to any node that can access theappliance.
Ganeti should integrate with external SAN/NAS appliances in a generic way,independent of the appliance itself in the easiest possible way from theadmin's perspective.
·
·
·
44/53
Introducing the 'ExtStorage Interface'
A simple interface inspired by the Ganeti OS interfaceTo plug an appliance to Ganeti, we need a corresponding 'ExtStorage provider'which is a set of scripts residing under a directory.e.g. /usr/share/ganeti/extstorage/provider1/
·
·
·
45/53
ExtStorage provider methods
Every ExtStorage provider should provide the following methods:
Create a disk on the applianceRemove a disk from the applianceGrow a disk on the applianceAttach a disk to a given Ganeti nodeDetach a disk from a given Ganeti nodeSetInfo on a disk (add metadata)Verify the provider's supported parameters
·
·
·
·
·
·
·
46/53
ExtStorage provider scripts
The methods are implemented in the corresponding 7 executable scripts, usingappliance-specific tools:
attach returns a block device path on success
Input via environment variables, e.g. VOL_NAME, VOL_SIZE
# ls -l /usr/share/ganeti/extstorage/provider1
createremovegrowattachdetachsetinfoverify
47/53
The new 'ext' template
Introduce a new externally mirrored disk template: extIntroduce a new disk option: provider
·
·
48/53
Using the interfaceExample
Assuming two appliances visible by a Ganeti cluster and their two ExtStorageproviders installed on all Ganeti nodes:
/usr/share/ganeti/extstorage/emc/*/usr/share/ganeti/extstorage/ibm/*
# gnt-instance add -t ext --disk=0:size=2G,provider=emc
# gnt-instance add -t ext --disk=0:size=2G,provider=emc \ --disk=1:size=1G,provider=emc \ --disk=2:size=10G,provider=ibm
# gnt-instance modify --disk 3:add,size=20G,provider=ibm
# gnt-instance migrate testvm1
# gnt-instance migrate n- nodeX.example.com testvm1
49/53
ExtStorage Interface dynamic parameters
Support for dynamic passing of arbitrary parameters to ExtStorage providersduring instance creation/modification per-disk:
The above parameters will be exported to the ExtStorage provider's scripts asenvironment variables:
# gnt-instance add -t ext --disk=0:size=2G,provider=emc,param1=value1,param2=value2 --disk=1:size=10G,provider=ibm,param3=value3,param4=value4
# gnt-instance modify --disk 2:add,size=3G,provider=emc,param5=value5
EXTP_PARAM1 = str(value1)EXTP_PARAM2 = str(value2)...
50/53
The new 'gnt-storage' client
Inspired by gnt-os:
# gnt-storage diagnose# gnt-storage info
51/53
Some images borrowed / modified from Lance Albertson, Iustin Pop,