Keeping it cool in a crisis (Relate Live London)
Transcript of Keeping it cool in a crisis (Relate Live London)
#RelateLive
#RelateLive
Keeping it Cool in a Crisis
Dave DysonZendesk Senior Customer Service Evangelist
@dave_dyson
Crisis CommunicationWhat we’ll cover
• Definitions • Goals • Ingredients for Success • Our Red Alert Process • Metrics
Definitions
What is this “Red Alert” that you speak of?Categories of Bad
• Service Disruption • Security Incident • Legal Entanglement • Public Relations Nightmare • Physical Emergency
Goals
The “Prime Directive”Your Mission
• Repair Trust with Customers • Restore Service ASAP • Consistent Response • Efficiency
WARMTH
CO
MPE
TEN
CE
WARMTH
CO
MPE
TEN
CE
Ingredients for Success
Planning AheadA Haiku
A Red Alert is not the time to figure out how to handle one.
KEEPCALM
AND
HAVE A PLANDOCUMENT IT
TRAINTAKE OWNERSHIP
AND
COMMUNICATE
Stay CalmEmotions are Contagious
As a leader, your emotions carry extra weight.
If you’re nervous, your team will be too.
EmpathizeShow You Care
Active ListeningHelps Defuse Anxiety
• Leave your ego behind • Frustration is normal • Let them speak • Verbal “nods” • Reflect what they say • Validate their emotions
“If you fail to plan, you are planning to fail” - Benjamin Franklin
“vaj nab luj SoH, luj nab SoH!” - Benjamin Franklin (original Klingon)
Red Alert PlansWhat to Include
• Process • Roles & Duties • Staffing • Tools • Communications • Special Cases • Metrics
StakeholdersThis isn’t just about you.
• Support Team • Engineering • Operations • Security • Marketing / PR • Sales • Account Management • Customer Success • Executives
Your Plan is Not Set in StoneHone & Refine
DocumentationIf it’s not written, it won’t be remembered
• Complete • Process & People • Clear • Accessible • Up to Date
TrainingPractice Makes Perfect
• Onboarding • Shadowing • Video • Drills
Take Ownership
• Empathize • Apologize • Don’t Shift Blame
Communication“All decks, this is the bridge…”
• Internal • External • Timely • Accurate • Compassionate • Honest • Transparent • One Voice
Red Alerts at ZendeskThe Plan in Action
Red Alert Process DocumentationKeep it secure, keep it safe
• Process Documentation • Checklist • Twitter & Flowdock • How to get Notified • Training Materials • On-Call Schedules
Zendesk Red Alert Process Overview
1. Assess the Situation 2. Alert the Incident Team 3. Communicate to Stakeholders 4. Public Acknowledgement 5. Status Updates 6. Resolve the Issue 7. Wrap Up
AssessmentIs This a Red Alert?
• Is there a current, ongoing, threat to the security of customer or Zendesk data?
• Are customers prevented from performing a critical Zendesk function?
• Is the event impacting multiple customers?
AssessmentSpecific Criteria
Critical Functionality: • Account Access • Performance Degradation • Channels • Core Partner Services • Business Rules / Routing • Agent Collision • Reporting • Apps Framework
Security Threat: • Compromised data • Spoofing • Stolen passwords • DDoS Attack
Internal AlertsTools and recipients
PagerDuty: Incident Response Team
(Support, Operations)
Email: Internal Stakeholders
(Engineering, Operations, Marketing)
Flowdock:Wider Team Visibility (Support, Operations)
Company-Wide Visibility
The Incident Response TeamThe Who’s Who
Incident Lead (Support) • Owns the problem ticket for Support • Gathers scope and impact info to share with Incident team • Updates support team on status
Support Duty Manager (Support) • Manages Support resources during the incident • Manages all customer-facing messaging
Operations Manager (Operations) • Manages Operations resources during the incident • Confirms facts about the nature of the incident • Makes decisions necessary to restore service
Incident Manager (Support Operations) • Assist with large incidents • Craft public post-mortem from internal version
Communication Flow
Customers
Customer Advocates
Incident Lead
Support Duty
ManagerOperations
Manager
Engineering & Operations
Staff
StaffingOn-Call Duty Rotation
Weekly on-call shifts for each role • 8 hours x 7 days – based in AMER, APAC, EMEA
Avoid consecutive duty shifts • Standard: one shift every 6 weeks
Support Duty Manager has on-call Backup as well • Allows for escalation when primary duty manager is unavailable
PagerDuty used for scheduling & alerts • Dashboard shows who is on call & upcoming schedule • Individuals can customizable alerts for incidents and upcoming shifts
The Zencident Room
Red Alert - Problem & Incident Ticket
Problem Ticket • Created internally (manually or by our Red Alert App) • Uses Service Disruption ticket form • Technical updates & discussions recorded here • No risk of inadvertent customer communication • Solves attached incidents automatically
Incident Tickets • Customer reports - attached to the Problem ticket • Proactive tickets attached as well
Service Disruption Ticket FormAdditional Fields
Added to the Problem Ticket: • Impacted Pods • Alert Duration • Number of Incidents • Link to public Help Center article • Link to internal Operations incident record • Checkbox: Post-mortem published
Communicating to CustomersKey Points
• Acknowledge Quickly • Take Ownership • Provide Scope and Impact • Regular Progress Updates • Be Transparent
Communication CadenceWhat to Share, and When
Event Time After Alert Called
Notes
Acknowledgement of Incident
ASAP but within 15 minutes
The sooner we acknowledge an incident publicly, the less anxious customers become.
Description of incident scope and impact
ASAP but within 30 minutes
Incident scope should be specific enough for customers to self-identify if they are impacted.
Status updates on investigation/resolution
Every 30 minutes thereafter
When possible, status updates should provide new information to demonstrate progress is being made toward resolution.
"All clear" ASAP when reached
As soon as Operations Manager and Support Duty Manager agree.
Pointer to Post-Mortem summary
Include with "All Clear"
Post-mortems should normally be posted within 3 business days of each incident.
Customer CommunicationsAdditional Details
Tweet Acknowledgement to @ZendeskOps
• Use list of sample tweets • Marketing Suspends Tweets to @Zendesk
Respond to Customer Tickets • As received • Attach to Problem ticket
Publish public Help Center article
• Use article template • Timeline and eventual post-mortem • Location: Service Disruptions section of public KB
Proactive Communication to Top Customers • Tickets & Phone Calls to affected customers
Timed Status Update Tweets to @ZendeskOps
• Default: every 15 minutes • Set expectations if it will be longer • Update Help Center article with tweets
“Tell them what you would want to know”
- Susan Griffin-Black, EO Products
Resolution & Wrap-UpOnce service is deemed restored by the incident team:
1. Ticket Resolution • Solving the Problem ticket solves all attached Incidents • Include link to Help Center article for post-mortem
2. Send “All Clear” Tweet • Include link to Help Center article for post-mortem
3. Post-Mortem (within 3 business days) • Operations team writes internal version • Support Operations Incident Manager edits for public
consumption, and publishes to Help Center article
Service Disruption Post-MortemWhat should it include?
• Scope and Customer Impact • Incident Duration • Communication Timeline • Process / Training Gaps • Recommendations for Improvement
Special SituationsExceptions to the Rule
Security Incidents Involve the Security Team
Zopim Outage Communicate via Zopim Twitter & Facebook
Partner Outage (Twilio, GoodData)
Report incident to Partner Refer to Partner’s system status page
Zendesk Support Instance is Down
Use backup voice service (IfByPhone) Outgoing communication via Tweets only
Shift Handoff Live handoffs only - no email
Overlapping Incidents Separate Red Alerts & Problem Tickets
The Red Alert AppCustom App in our Zendesk Instance:
1. Displays links to active Red Alerts 2. Turn a ticket into a Red Alert ticket
• PagerDuty, Ticket Form & fields 3. Send Social Media Communications
• Twitter, Facebook • Composition, templates, character counts • Salutation and closing • “All Clear” option adds link to Help Center article • Creates or updates Help Center article
4. Proactive Ticket Communications • Lists top customers, with Pod & feature info • Composition, templates, all-clear option • Creates or updates Incident tickets • Updates Yammer
5. All actions add Internal Notes to Problem ticket
MetricsRed Alert Impact Report (Insights)
• Number of Red Alerts • Number of attached Incidents • Total Support Handle Time • Estimated Support Cost ($/ticket) • Number Proactive Tickets • Top Red Alerts (by # of incidents) • Red Alerts by “About” field • Customer Satisfaction over time
• Process Scorecard
The Future!Where No One Has Gone Before
System Status Page (in Beta) • Loads more detail! Improved Red Alert App • More capabilities Wider Proactive Communication • More channels • More customers Incident Management as a Job • Not just a role
Conclusion
WARMTH
CO
MPE
TEN
CE
KEEPCALM
AND
PLAN AHEADDOCUMENT IT
TRAINTAKE OWNERSHIP
AND
COMMUNICATE
Smooth Sailing“Second star to the right, and straight on ‘till morning…”
#RelateLive
Q & A