High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to...
Transcript of High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to...
![Page 1: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/1.jpg)
High Availability
DevOps
HA Features for
Docker Swarm and
GitLab
![Page 2: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/2.jpg)
High-Availability DevOps
Deploying and managing a DevOps environment requires
attention to the elimination of single points of failure.
Using open source High-Availability and Desired State
Configuration tools, we address the availability and
maintainability of our overall DevOps environment and the
resources and services that it requires.
![Page 3: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/3.jpg)
Topics to be Covered
• DevOps single points of failure
• Tools and methods to ameliorate risk
• Infrastructure as Code
• SRE Error Budgeting
![Page 4: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/4.jpg)
Environment Overview
• Test and Prod Swarms, each 5 nodes– Docker CE
– Ubuntu 18.04
• GitLab CE for CI/CD and Docker Registry
• Apache 2 Load Balancers for Apps
• SaltStack codebase defines infrastructure
![Page 5: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/5.jpg)
Failure Modes
![Page 6: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/6.jpg)
Example DevOps Infrastructure
Orchestration
4
3
5
2
1
Codebase, Integration &
Deployment
Infrastructure &
Services
4
3
5
2
1
TEST
PROD
GitLabDatabases
Services
Applications
Infrastructure as Code
SaltStack
![Page 7: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/7.jpg)
Application Dependencies
network LB
RISK: 1 load
balancer for
ingress
4
3
5
2
1
RISK: node
availability,
ingress
availability
LDAP
SMTP
SQL
Infrastructure
Services
![Page 8: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/8.jpg)
Deployment Dependencies
Repository CI/CD
Script
RISK:
single VM
REMEDIATION:
HA Deployment
or Cloud
RISK:
TEST==PROD?
REMEDIATION:
Same CI Code
with Interpolation
go
Deploy
Container
(Runner)
RISK:
Runner Available
REMEDIATION:
Pacemaker
Bundle
audit+
health+
monitor
Validation
RISK:
Did it deploy and
stay deployed?
REMEDIATION:
Auditing,
Healthcheck,
Monitoring
![Page 9: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/9.jpg)
Swarm Topology
node3
node5
node2
node1
Manager
Leader
Runner
Ingress
node4
![Page 10: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/10.jpg)
Swarm Topology Failure Response
● Partition might lead to a leader
election
● Mesh network means any
node can have an ingress to a
stack’s service.
● Swarm will try to maintain
replica requirement.
node3
node5
node2
node1
node4
![Page 11: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/11.jpg)
(our) Swarm Integration 1
● In order to run docker stack deploy a GitLab runner (a
container) must be on a manager node — we’re making
all peer nodes managers and using Pacemaker bundle
to ensure container start.
● Having a DNS VIP ingress requires network and Docker
reconfiguration and restart (we have a script in salt-call
and call that from a Pacemaker alert monitor.)
![Page 12: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/12.jpg)
(our) Swarm Integration 2
● Although Docker Swarm is supposed to ensure that the
requested number of replicas are started, in practice,
there is occasionally a deficit, especially after an event.
● After an cluster event, another salt-call script is run that
simply updates any service not running enough
replicas.
● Automated deployment and service updates requires
valid registry authorization (We use CI_TOKEN in
deployment with a credential helper.)
![Page 13: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/13.jpg)
(our) Load Balancer
• Apache2 with mod_proxy
• Location directive to map URI to a service
• One load balancer: unscheduled
maintenance impossible
• One proxy entry: single point of ingress
![Page 14: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/14.jpg)
Application Environment
● Applications behind LB could be in
container environments, on VM or in
cloud.
● Container environment is Docker Swarm
● Services generally provisioned by VMs
![Page 15: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/15.jpg)
Load Balancer Topologies
LB
1
2
4
3
5
<Location /app1>
RedirectMatch "(.*)/app1$" \
"https://appsdemo.holycross.edu/apps1/$1"
require all granted
ProxyPass https://swarmdemo1.holycross.edu:6549 retry=5 \
acquire=3000 timeout=600 Keepalive=On
...
ProxyPass https://swarmdemo5.holycross.edu:6549 retry=5 \
acquire=3000 timeout=600 Keepalive=On
ProxyPassReverse https://swarmdemo1.holycross.edu:6549
...
ProxyPassReverse https://swarmdemo5.holycross.edu:6549
SetEnv proxy-sendchunked 1
</Location>
![Page 16: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/16.jpg)
Pacemaker
Load Balancer Clustered Ingress
LB
1
2
4
3
5
<Location /app1>
RedirectMatch "(.*)/app1$" \
"https://appsdemo.holycross.edu/apps1/$1"
require all granted
ProxyPass https://swarmdemo.holycross.edu:6549 retry=5 \
acquire=3000 timeout=600 Keepalive=On
ProxyPassReverse https://swarmdemo.holycross.edu:6549
SetEnv proxy-sendchunked 1
</Location>
![Page 17: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/17.jpg)
Pacemaker Pacemaker
Clustered Load Balancer
LB1
2
4
3
5
<Location /app1>
RedirectMatch "(.*)/app1$" \
"https://appsdemo.holycross.edu/apps1/$1"
require all granted
ProxyPass https://swarmdemo.holycross.edu:6549 retry=5 \
acquire=3000 timeout=600 Keepalive=On
ProxyPassReverse https://swarmdemo.holycross.edu:6549
SetEnv proxy-sendchunked 1
</Location>
LB
![Page 18: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/18.jpg)
Pacemaker Pacemaker
Dual Ingress
LB1
2
4
B
3
5
A
<Location /app1>
RedirectMatch "(.*)/app1$" \
"https://appsdemo.holycross.edu/apps1/$1"
require all granted
ProxyPass https://swarmdemoA.holycross.edu:6549 retry=5 \
acquire=3000 timeout=600 Keepalive=On
ProxyPass https://swarmdemoB.holycross.edu:6549 retry=5 \
acquire=3000 timeout=600 Keepalive=On
ProxyPassReverse https://swarmdemoA.holycross.edu:6549
ProxyPassReverse https://swarmdemoB.holycross.edu:6549
SetEnv proxy-sendchunked 1
</Location>
LB
![Page 19: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/19.jpg)
Reducing Risk1 Ingress
1 Balancer
HA Ingress
1 Balancer
HA Ingress
HA Balancer
HA Ingress (2)
Single Point
Failure?ES
YES YES NO NO
Transition
Ingress (s)
Intervention 13 sec. 13 sec. < 13 sec.
Transition
Balancer (s)
Intervention Intervention 1 sec. < 1 sec.
![Page 20: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/20.jpg)
HA Load Balancer
● Configure 2 (or more) Apache servers with
proxy configuration in a Pacemaker
configuration with a VIP.
● If a load balancer crashes or needs
maintenance, Pacemaker can move the load
balancer service to an alternate node, manually
or automatically.
![Page 21: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/21.jpg)
DevOps Storage Models
• Storage reliability and manageability is
already fairly high because of clustering
and LVM.
• Many storage requirements can be
managed using databases, repositories,
or tagged storage.
![Page 22: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/22.jpg)
Storage Failure Modes
• One way to manage larger storage usage
by a service is to map it to a Docker
volume through a share/mount.
• This presents an availability issue for the
sharing node, either for node failure or a
maintenance window.
![Page 23: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/23.jpg)
Tools & Methods
![Page 24: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/24.jpg)
Tools & Methods Overview
• HA Cluster– Pacemaker/Corosync
• Desired State Configuration– SaltStack
• Highly Available Storage– DRBD, S2D
![Page 25: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/25.jpg)
High-Availability Clustering
• IPaddr2 resource virtual IP resource will be
auto-managed by the cluster.
• alerts event handlers run on nodes before or
after a cluster event, used to update
configuration.
• Docker bundle ensures that GitLab runner
containers are on each node.
![Page 26: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/26.jpg)
Desired State Configuration
• Configuration for Docker, the cluster, alerts
and the ingress VIP stored in a YAML pillar
database.
• (push) salt state.apply to build Docker
nodes, configure alerts, VIP, etc.
• (pull) salt-call state.apply to update running
configuration of a node’s daemon.json.
![Page 27: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/27.jpg)
Redundant Swarm Ingress
pillar YAML configuration for Virtual IP:
swarmtest_vip_cib:
resource:
swarmtest_vip:
resource_type: "ocf:heartbeat:IPaddr2"
resource_options:
- 'ip=192.168.1.120'
- 'cidr_netmask=32'
- 'iflabel=IP_VIRTUAL'
![Page 28: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/28.jpg)
Docker Node Self-Configuration
● At initial node build, or on an event, SaltStack reads the
configuration in serialized (JSON) form from a Salt
‘pillar’ data set.
● The Salt ‘pillar’ is also dynamically configured with
current network configuration, independently of the
logical configuration of the Swarm.
● Changes to the /etc/docker/daemon.json file will trigger
a restart of Docker (i.e., with updated network
addresses.)
![Page 29: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/29.jpg)
Daemon_JSON Salt pillar fragmentDaemon_JSON:
{{grains.get('docker-swarm-name','')}}:
hosts:
- "fd://"
{% for interface,addresses in grains.get('ip4_interfaces',{}).items() %}
{% if interface is not match('docker*') %}
{% for ip in addresses %}
- "tcp://{{ip}}:2376"
{# addresses #}
{% endfor %}
{% endif %}
{% endfor %}
storage-driver: overlay2
![Page 30: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/30.jpg)
Docker.daemon state fragment 1
Daemon_Running:
service.running:
- name: docker
- enable: True
- restart: True
- watch:
- file: /etc/docker/daemon.json
![Page 31: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/31.jpg)
Docker.daemon state fragment 2
Daemon_JSON_{{pillar_name}}:
file.serialize:
- name: /etc/docker/daemon.json
- dataset_pillar: "{{pillar_path}}"
- formatter: json
- merge_if_exists: True
- show_changes: True
- user: root
- group: root
- mode: 644
![Page 32: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/32.jpg)
Redundant Services
● load balancers
(Apache, NGINX)
● smtp gateway
(Postfix, sendmail)
![Page 33: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/33.jpg)
Redundant Filesystems and Shares
● Vendor solutions
○ EMC Isilon (CIFS+NFS)
○ Netapp (CIFS+NFS)
○ Pure Storage (CIFS+NFS)
● Microsoft
○ Azure Stack HCI (CIFS/ReFS)
● Open source
○ DRBD (NFS)
○ CEPH (CIFS, NFS, S3)
![Page 34: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/34.jpg)
Docker Volumes and HA
● Most of our containerized applications either
use a database directory, or manage data on a
docker volume through a repository.
● We have a few static websites where we need
HA disks which we map onto docker volumes
via network filesystems.
![Page 35: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/35.jpg)
Mapping NFS to Docker
● Allow docker swarm to manage the NFS
or CIFS mount in the compose file.
● HA disk server keeps mount available
during unexpected or scheduled
downtime.
![Page 36: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/36.jpg)
Compose NFS Mount Definition
volumes:
- type: volume
source: web-cgibinintranet
Target:
$HC_WEB_CGIBININTRANET_MOUNTPOINT
volume:
nocopy: true
![Page 37: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/37.jpg)
Compose NFS Volume Definition
volumes:
web-cgibinintranet:
driver_opts:
type: "nfs"
o:
"nfsvers=4,addr=sanfs1.holycross.edu,ro"
device: ":/sanfs/cgibinintranet"
![Page 38: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/38.jpg)
Compose CIFS Mount Definition
volumes:
- type: volume
source: web-cifs
target: $HC_ALT_LEGACY_MOUNTPOINT
volume:
nocopy: true
![Page 39: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/39.jpg)
Compose CIFS Volume Definition
volumes:
web-cifs:
driver_opts:
type: "cifs"
o:
"username=${USER},password=${PASS},domain=${DOM
AIN},iocharset=utf8,uid=${UID},gid=${GID}"
device: "//${SMB_SERVER}/web/legacy"
![Page 40: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/40.jpg)
Infrastructure as Code
![Page 41: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/41.jpg)
Motivations and Benefits
• Apply DevOps to system administration
– Repository, pipelines, issues, documentation
• Push configuration to build standardized templates,
validate, deploy and audit.
• Pull configuration for events, triggers, self-configuration.
• Extend and reuse code across platforms.
![Page 42: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/42.jpg)
Build and Deployment
• We apply about 333 formulas on a typical Linux deploy
to ensure desired configuration.
• 5-10 minutes to deploy a Linux template after adding it
to authentication domain and defining some metadata.
• Build and deploy a Docker node in about 20-30 minutes
using a base template deploy.
• We also build complex clusters and application services
on top of nodes built this way.
![Page 43: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/43.jpg)
Validation and Audit
• We validate and apply over 200 CIS rules on a base
Linux deployment, and additional CIS rules for Docker,
MySQL, Postgres, Apache, as well as internnally
developed best practices.
• Standardized tags on best practice rules can be parsed
into JSON for parsing into compliance reports and
documentation.
![Page 44: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/44.jpg)
Self-Configuration
• Interactively fix a configuration knowing the change is
already documented in code.
• An event or trigger can reconfigure the running system
according to current state rather than the state at build
time.
![Page 45: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/45.jpg)
SRE Error Budgeting
![Page 46: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/46.jpg)
Infeasible 100%
• As much as we’d like to have 100% uptime, we cannot
possibly guarantee that, and all of our infrastructure
needs occasional maintenance.
• We perform scheduled maintenance, but it is difficult to
schedule, and disruptive. DevOps, clustering and
virtualization generally has increased our ability to
safely perform unscheduled maintenance.
![Page 47: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/47.jpg)
Current Monitoring
• Our current monitoring is cloud based, and simply
measures service availability.
• We need richer indicators with measurable objectives,
that lead to defined responses.
![Page 48: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/48.jpg)
Service Level Indicators: SLI
• SLI - service level indicator.
– A good SLI should be at least a scalar, e.g., instead
of measuring ‘uptime’, we could measure ‘errors per
interval.’
– Try to standardize common SLIs for reuse.
– Naturally, some SLIs will be specific to the service.
![Page 49: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/49.jpg)
Service Level Objectives: SLO
• Set internal objectives which will be used to manage
change.
• Typically once you set the SLO, say, “99.5% of
transactions will have an average latency of less than
500ms”, you define your error budget as 1 - n. So in
that case, if your latency average climbs above 500,
you exceeded your error budget. When we exceed our
error budget, we change our focus from new features to
stability.
![Page 50: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/50.jpg)
Service Level Agreement: SLA
• The SLA will be the agreement you have with the
customer, and it will generally be a looser objective than
the SLO.
• As in the SLO, the SLA will need to have consequences
for exceeding the error budget, in the case of an internal
customer, perhaps a review.
![Page 51: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/51.jpg)
Tool Versions
• ClusterLabs pacemaker 1.1.18
• RedHat corosync 2.4.3
• Docker CE 19.03.4
• Ubuntu Server 18.04
• Windows Server 2016 Datacenter 1607
• Virtual Box (for demos) 6.0.14
![Page 52: High Availability DevOps - NERCOMP...• 5-10 minutes to deploy a Linux template after adding it to authentication domain and defining some metadata. • Build and deploy a Docker](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec3b7ee19eb42065176e363/html5/thumbnails/52.jpg)
Some Unreviewed Tools
• Load balancing with IPVS/VRRP– Keepalived
– NGINX
– Traefik
• Storage Alternatives– S2D Azure Stack HCI
– CEPH