on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · •...

30
1 + DataOps - June 2018 Arnaud VERON - OpenSVC Stephane VAROQUI - Signal18 Replication-Manager Container Infrastructure on OpenSVC clusters

Transcript of on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · •...

Page 1: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

1

+

DataOps - June 2018Arnaud VERON - OpenSVCStephane VAROQUI - Signal18

Replication-ManagerContainer Infrastructure on OpenSVC clusters

Page 2: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

2

Replication Manager MRM

Page 3: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

What is replication-manager ?• State machine

• Event scheduler

• Monitoring(repl,status,var, schema)

• DB Job cronner sender & receiver (backups,logs)

• HA failover

• Multi topology

Master-slave, Multi-master, ring

Multi-source, Gtid, Pseudo-gtid

• Multi route

Haproxy, ProxySQL, Vitess, Maxscale, Consul

Scripts, Shardproxy

• Multi client (Rest API, HTTP , cmd line)

Replication-Manager - Election on async replicas

What is Signal 18 ?• Packaging

• Non regression tests

• Continuous integration build

What is OpenSVC ?• Services orchestration

Docker, KVM, Zone, LXC, ...

• Infrastructure monitor

• Configuration manager

• Service based failover, placement and scaling

Page 4: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

Maxscale - ProxySQL• Database protocol aware

• Database topology aware

• SQL parser for complex filtering

• Funneling, Multiplexing, Pooling

• Need scaleup depending on the route and filter complexity

• Pluggable for routes, parsers, protocols, monitors and filters

• BSL licence vs BSD

Replication-Manager - Routing

HAProxy • Protocol agnostic

• Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3,Percona, Amazon)

• Long time open source: well known, tested, documented

• Proven minimal resources usage at layer 7

Consul DNS • More and more used for micro services

Scripts or others ….ShardProxy

• MariaDB 10.3 and spider

• Table discovery on multiple shards clusters

Page 5: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

Failover - False Positive Detection

Every component failure is self healing, work at his own speed and it define a failable agenda (SLA) Default on own capabilities to catch with demand and internal failure state.

Distinguishing false positive failure that belong to some possible auto recover scenario from a real failure makes it complex for automation.

Replication-manager default is to:

Alert and wait for a user interaction !

Replication-Manager - Election on async replicas

failover-mode = “automatic” vs failover-mode = “manual”

Page 6: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

Switchover Workflow

Replication-Manager - Election on async replicas

Page 7: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

Failover Monitoring Workflow

Replication-Manager - Election on async replicas

Page 8: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

Replication-Manager - SettingsMinimum settings# TOPOLOGYtitle = "ClusterEnterpriseMasterSlave"db-servers-hosts = "db11,db12,db13"db-servers-prefered-master = "db11"db-servers-credential = "root:mariadb"db-servers-connect-timeout = 1replication-credential = "root:mariadb"# LOGlog-file = "./dashboard/replication-manager.log"

mail-from = "mrm@localhost"mail-smtp-addr = "localhost:25"mail-to = "[email protected]"

Constraints automatic failover

# failover-limit = 0# failover-time-limit = 0# failover-at-sync = false# failover-max-slave-delay = 30# failover-restart-unsafe = false

Failback

# autorejoin = true

# autorejoin-script = ""

# autorejoin-semisync = true

# autorejoin-backup-binlog = true

# autorejoin-flashback = false

# autorejoin-mysqldump = false

graphite-metrics = true

graphite-embedded = true

graphite-carbon-api-port = 10002

Page 10: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

10

OpenSVC

Page 11: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

Agent Stack (Open-source)

• Clusterware & Orchestrator

• Drivers for Packing Resources in Services

• Services Topologies : Failover(A/P), Flex (A/A), Scaler

• Unified Command Line Interface

• Bootstrap and Launch Predefined or Custom Services

• Collect Hardware and Software Information

• Apply Predefined Software Configuration Rules

Collector Stack (Optional)

• Centralized Web portal, Public (SaaS) or Private

• RBAC, Service templates and Configuration Rules

• Back Office Dashboards, Monitoring, Secrets Vault

• Rest API

OpenSVC

Page 12: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

Infrastructure Schema

c1n1 - OVH91.121.222.0/24

c1n2 - OVH94.23.29.0/24

Linux oszfs pool for data

OpenSVC Cluster

CNI + Weave setup 10.32.0.0/12 Private Class A

odns oweave collector ogwl4CORE SERVICES

REPLICATION MANAGER SERVICES

repman repman-dr

DB SERVICES

MariaDB MariaDB

DB SERVICES HAPROXY

PROXY SERVICES

Page 13: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

Dual Nodes HA Cluster - local storage only• CNI Runtime + Weave plugin installed on both nodes

- Provides software defined network layer between both nodes (vxlan)

- Dockerized network management daemons

• Nodes are joined in a OpenSVC Cluster- Production grade HA cluster & Container orchestrator

- Meta service “core” manage subservices “odns”, “oweave”, “ogwl4”, “collector”

- odns : Intra-cluster DNS for service-to-service addressing

- oweave : Makes sure the nodes get a private network IP

- ogwl4 : Layer 4 Ingress gateway service

- collector : Hosts the OpenSVC collector software stack

• Replication Manager Layer- 2 services : “repman”, ‘“repman-dr”

- Using Signal18 Docker images

Infrastructure Details

Page 14: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

Name Service Resolution • Problem : How to provide reliable and dynamic name service resolution for provisioned,

unprovisioned or scaled services plugged in the private network (10.32.0.0/12) ?

• Solution : The “odns” core service- A active/active service running a PowerDNS server and recursor

- The OpenSVC daemon acts as a PowerDNS remote backend accessed through the

/var/lib/opensvc/dns/pdns.sock unix socket

- The OpenSVC remote backend serves A records for each service- IN A nginx.myappcode.svc.myclustername

- Describe exposed ports in OpenSVC service config file

- Example : nginx web server expose = 80/tcp 443/tcp

- The OpenSVC remote backend then serves SRV records- IN SRV _8080._tcp.nginx.myappcode.svc.myclustername- IN SRV _8443._tcp.nginx.myappcode.svc.myclustername

- The OpenSVC agent configures container resolvers to use the PowerDNS recursors

Infrastructure Details

Page 15: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

Ingress Gateway • Problem : how to expose services using private CNI/Weave IP addresses on a public IP ?

• Solution : The “ogwl4” core service- A GoBetween container (http://gobetween.io)

- L4 Load Balancer and reverse proxy with REST API- Owns the public IP address

- A GoBetween Janitor container (https://docs.opensvc.com)- Listen to OpenSVC cluster events and configures GoBetween- Services SRV records are used as GoBetween “servers” backends

- Describe port mapping in OpenSVC service config file

- Private expose of a nginx web server: expose = 80/tcp 443/tcp

- Public expose of these ports: nodemgr set --kw env.igw_gobtw_bind=”80/tcp-0.0.0.0:8080 443/tcp-0.0.0.0:8443”

Infrastructure Details

Page 16: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

Provisioning Cinematic • [End User] Log in repman web portal

• [End User] Build new cluster + Fill in required informations

• [RM] Build OpenSVC service templates + tags and post them to Collector

• [RM+OpenSVC] Instantiate templates on c1n1/c1n2

• [RM+OpenSVC] Provision services on c1n1/c1n2 and run compliances playbooks

• [RM+OpenSVC] Start services and check operational status

Replication Manager

Page 17: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

API to CLI mapping • The Replication Manager command below produce OpenSVC service configuration

• This template is injected into the OpenSVC collector by RM

• It can be also used as is, once redirected into a file, it can be promoted as an OpenSVC service with the command :

OpenSVC Commands

$ replication-manager-cli api --url "https://127.0.0.1:10005/api/clusters/3x_mariadb102_multidomain_2x_proxysql/servers/db16045609442561543509/service" > /tmp/mysvc.template.conf

$ sudo svcmgr -s mydb create --provision --config=/tmp/mysvc.template.conf

Page 18: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

• db1604… is the service name

• [DEFAULT] section is used to setup global parameters for the service

• nodes = list of nodenames where the service can run

• topology = flex means that this service is an active/active setup

• flex_primary = nodename is the service master, used for data replication

OpenSVC DB Service 1/6

• rollback = false do not stop the service if one ressource fails during service startup

• app = myappcode is used to “attach” the service to a well known application code in the IT

• docker_daemon_private = false use the global docker daemon, and do not spawn a service dedicated docker daemon

• docker_data_dir = /path/to/dir use this directory to store all the docker stuff (only used with docker_daemon_private = true)

• docker_daemon_args = ... arguments passed to the docker daemon (only used with docker_daemon_private = true)

• {env.abcdef} is a reference to the variable abcdef in the env section

Page 19: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

• [fs#00] section is used to declare a filesystem resource managed by OpenSVC

• type = zfs teach agent that the FS is ZFS

• dev = data/abc tells agent to search for dataset abc in ZFS pool data

• mnt = /path/to/mnt is used to specify the FS mountpoint

OpenSVC DB Service 2/6

• size = 2g is used once at service provisioning time, to create a 2GB dataset in ZFS pool

• mkfs_opt = ... use this variable if you need particular mount options

• standby = true asks agent to keep this resource up on any node

• post_provision = ... is a trigger launched only once, after provisioning, to run compliance

Page 20: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

• [ip#01] section is used to declare a ip resource managed by OpenSVC

• tags = ... user defined tags used for resource group management

• type = cni teach agent to rely on the CNI ip driver

OpenSVC DB Service 3/6

• container_rid = container#0001 tells agent that the interface eth12 have to be created in container described in container#0001 resource

• network = repman use the CNI network named repman to provision an ip (network should be listed by ‘nodemgr network ls’)

Page 21: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

• [container#0001] section is used to declare a container resource managed by OpenSVC

• type = docker teach agent to rely on the docker container driver

• run_image = ... docker image to use

• run_args = ... docker run options

--net=none : do not manage network layer

-i -t : allocate tty and keep stdin open

-v … : bind mount to use same time in container than in hypervisor

• run_command = ... docker command to run

OpenSVC DB Service 4/6

• tags = ... user defined tags used for resource group management

• run_args = ... docker run options

--net=container... : use same network namespace as {svcname}.container.0001

-e ... : set environment variables inside container

-v … : bind mounts from hypervisor to container

Page 22: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

• [task#01] section is used to declare a task resource managed by OpenSVC

• schedule = ... user defined schedule for automatic task execution (@1 = once per minute)

• command = ... command executed when task is triggerred

OpenSVC DB Service 5/6

• user = ... command is executed as this user identity

• run_requires = ... list of conditions that must be met to allow the task execution

An OpenSVC task is usefull considering service mobility, any scheduled job “follow” the service. No more outdated crontabs on backup/drp servers.

Page 23: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

OpenSVC DB Service 6/6

• [env] section is a user defined section to declare variables to use as references

• Any reference like {env.abc} will be resolved using information filled in [env] section, variable abc

• mysql_root_pasword can be fetch from collector KV store named the safe

Page 24: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

OpenSVC Compliance 1/2Key Concepts

• The OpenSVC collector provides a configuration management framework, enabling features alike to ansible, chef, puppet, ...

• Workflow

–Setup configuration targets using the web interface (or the rest api), like :

•Mysql uid/gid, Configuration files permission, Mysql sql_mode=oracle

•Ssl certificate autogen, install package, create symlink, ...

–If needed, set up a context for this configuration item, like :

•Only on Linux systems with more than 4 cpus and 32GB RAM, located in France

•Only if OpenSVC service is tagged with tag “FOO” and tag “BAR”

–Check or/and Apply those configurations to the systems concerned

• This template is injected into the OpenSVC collector by RM

• It can be also used as is, once redirected into a file, it can be promoted as an OpenSVC service with the command :

Page 25: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

OpenSVC Compliance 2/2Key Concepts

• Based on those features, Replication manager not only automatically build the service configuration file, but also triggers the OpenSVC compliance subsystem to apply system and databases setup as preconfigured in the RM interface

• This is how right after provisioning step, the database is fully operational

Page 26: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

OpenSVC Secret Vault

Key Concepts• OpenSVC collector provides a feature allowing secrets to be stored into the collector

–key/value, strings, files, binary objects

–Rest/API or GUI Management

–Compliant with embedded RBAC model (Any user of a group can access ro or rw an object it is authorized to)

–Automated Git tracking

• This feature is very convenient to avoid :

–Secrets hidden into flat files in filesystem

–Secrets exposed in process environment

–Secrets directly written into OpenSVC service configuration files

• Just point to the safe id or uuid in the service configuration file, and the retrieval is managed by the agent

Page 27: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

OpenSVC svcmon

• Threads section is summarizing states per nodes for core orchestrator functionalities

• Nodes section report key performance metrics that are inputs for some placement policy algorithms used in service orchestration

• Services inventories the services deployed on the cluster, with states information per nodes

O up

o stdby up

X down

x stdby down

! warn

P unprovisioned

* frozen

^ leader node or service placement non-optimal

/ not applicable

Page 28: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

OpenSVC svcmgr

• svcmgr -s <svcname> print status to display per service resource status

• svcmgr -s <svcname> print config to display service config

• svcmgr -s <svcname> {stop, start, takeover, giveback, ...} to manage service

• svcmgr -s <svcname> {stop, start} --rid container#2 to manage resources

R RunningM MonitoredD DisabledO OptionalE EncapP ProvisionedS Standby/ Not Applicable

Page 30: on OpenSVC clusters Container Infrastructure Replication-Manager · 2018-07-02 · • Authentication agnostic using server that Support Proxy Protocol (MariaDB 10.3 ... Monitoring,

30

Q&A