Incident Management with Workflows
-
Upload
patrick-hoolboom -
Category
Software
-
view
75 -
download
2
Transcript of Incident Management with Workflows
![Page 1: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/1.jpg)
Patrick Hoolboom September 22, 2016
Incident Managementwith Workflows
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY
![Page 2: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/2.jpg)
What is a Workflow?
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY
![Page 3: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/3.jpg)
What Is a Workflow?
• A sequence of processes through which a piece of work passes frominitiation to completion• Process as Code• Living Documentation
– Document your process in an easily human readable, executable format
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY 3
![Page 4: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/4.jpg)
Event Driven Automation 2.0
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 4
FBAR (saving 1532 hours/day)
Naoru
Nurse
Winston (powered by StackStorm)
Azure Automation
Mistral workflow service
StackStorm automation platform
ACT
OBSERVE
ORIENT
DECIDE
![Page 5: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/5.jpg)
When to use a workflow
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY
![Page 6: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/6.jpg)
When to use a workflow
• Clearly defined process• When multiple systems or services need to be touched• Frequently performed tasks
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY 6
![Page 7: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/7.jpg)
Why Use Workflows?
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY
![Page 8: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/8.jpg)
Why Use Workflows?
• Consistency
– Trust that your automations will perform the same tasks every timefor a given event
• Speed– Reduce time to resolution for an incident
Audit– Creates a clear audit trail of what was done when
• Connect Disparate Systems
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY 8
![Page 9: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/9.jpg)
Tools…
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 9
![Page 10: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/10.jpg)
What Can Be Automated?
• Security checks– On malware detection in a VM, isolate
network port on a switch• Blue-green app deployment
– On Jenkins tests passed, bring new vmclaster, deploy and configure app, setloadbalancer to send % of traffic to new app,monitor, roll forward, or back out
• Networking– On BGP peer goes down: collect
troubleshooting data, post on slack & createJIRA ticket
– On Link aggregation member error, checkload, if capacity of rest of LAG bundleenough, disable link with error
• Restart a down service– On monitoring event, bounce a service
• OpenStack orphan VM clean-up– On orphans detected, shut down, email owner,
keep for few days, delete• NFV:
– Nokia, AT&T, with Mistral and OpenStack• OpenStack VM evacuation on
hardware failures– On host RAID failure, get list of impacted VMs,
email VM owners, evacuate VMs, create JIRAticket for hardware replacement.
• Cassandra “node down”recovery
– Replace a node on alert
• Clean up disk space– On monitoring event, clean up disk space
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 10
![Page 11: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/11.jpg)
StackStorm
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY
![Page 12: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/12.jpg)
Architecture
12
Web GUI CLI Chatops
Sensor Containers Action Runners
Sensor Plugins(inbound integrations)
Master Content Repo
to Audit…
Action Plugins(outbound integrations)
PLATFORM
CLIEN
TSPLU
GINS
AMQP message busAMQP message bus
Workflow Engine
REST API
{*}
RulesEngine
IFTTT.yml
KV Store
k[v]
![Page 13: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/13.jpg)
● Diagnostic Workflows
● Remediation Workflows
Workflow DesignPatterns
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY
![Page 14: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/14.jpg)
Workflow Design PatternsDiagnostic Workflows
• Troubleshooting and data gathering steps• No remediations or changes to the system• Good way to “get your feet wet” with workflows
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY 14
![Page 15: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/15.jpg)
Workflow Design PatternsRemediation Workflows
• Fix the issue!• Should be triggered after diagnostic workflows if applicable•
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY 15
![Page 16: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/16.jpg)
● Facilitated Troubleshooting
● Auto-Remediation
Workflow Use CasesDuring an Incident
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY
![Page 17: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/17.jpg)
Workflow Use CasesFacilitated Troubleshooting
• Useful if you don’t quite trust the automation– Gain confidence in your workflows
• Faster Time to Resolution
• Consistent Data Collection• Diagnostic workflow with notifications
– Send data to user via
• Chat
• Ticketing System
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY 17
![Page 18: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/18.jpg)
Workflow Use CasesAuto-Remediation
• Trusted Automation– Will make automated changes to the system
• Much Faster Time to Resolution• Consistent Solutions• Less Pager Fatigue
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY 18
![Page 19: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/19.jpg)
● Low Disk Space Event
Example
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY
![Page 20: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/20.jpg)
Automation Example
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. 20
Automation
EngineerService
Monitoring IncidentManagement
Event: “low diskon web301”
Web301 is“low disk”
Resolve knowncases, fast. Is it
/var/log? Clean up!
Unknownproblem, need a
human
Wake up, buddy.Something real
is going on…
![Page 21: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/21.jpg)
21
![Page 22: Incident Management with Workflows](https://reader031.fdocuments.us/reader031/viewer/2022030302/587d898a1a28abcd648b5e0d/html5/thumbnails/22.jpg)
● Email: [email protected]
● Twitter: @DoriftoShoes
Thank You!
© 2016 BROCADE COMMUNICATIONS SYSTEMS, INC. INTERNAL USE ONLY