Keeping it cool in a crisis (Relate Live London)

73
#RelateLive

Transcript of Keeping it cool in a crisis (Relate Live London)

Page 1: Keeping it cool in a crisis (Relate Live London)

#RelateLive

Page 2: Keeping it cool in a crisis (Relate Live London)

#RelateLive

Keeping it Cool in a Crisis

Page 3: Keeping it cool in a crisis (Relate Live London)

Dave DysonZendesk Senior Customer Service Evangelist

@dave_dyson

Page 4: Keeping it cool in a crisis (Relate Live London)
Page 5: Keeping it cool in a crisis (Relate Live London)
Page 6: Keeping it cool in a crisis (Relate Live London)
Page 7: Keeping it cool in a crisis (Relate Live London)
Page 8: Keeping it cool in a crisis (Relate Live London)
Page 9: Keeping it cool in a crisis (Relate Live London)
Page 10: Keeping it cool in a crisis (Relate Live London)
Page 11: Keeping it cool in a crisis (Relate Live London)
Page 12: Keeping it cool in a crisis (Relate Live London)
Page 13: Keeping it cool in a crisis (Relate Live London)
Page 14: Keeping it cool in a crisis (Relate Live London)
Page 15: Keeping it cool in a crisis (Relate Live London)
Page 16: Keeping it cool in a crisis (Relate Live London)
Page 17: Keeping it cool in a crisis (Relate Live London)

Crisis CommunicationWhat we’ll cover

• Definitions • Goals • Ingredients for Success • Our Red Alert Process • Metrics

Page 18: Keeping it cool in a crisis (Relate Live London)

Definitions

Page 19: Keeping it cool in a crisis (Relate Live London)

What is this “Red Alert” that you speak of?Categories of Bad

• Service Disruption • Security Incident • Legal Entanglement • Public Relations Nightmare • Physical Emergency

Page 20: Keeping it cool in a crisis (Relate Live London)

Goals

Page 21: Keeping it cool in a crisis (Relate Live London)
Page 22: Keeping it cool in a crisis (Relate Live London)
Page 23: Keeping it cool in a crisis (Relate Live London)

The “Prime Directive”Your Mission

• Repair Trust with Customers • Restore Service ASAP • Consistent Response • Efficiency

Page 24: Keeping it cool in a crisis (Relate Live London)
Page 25: Keeping it cool in a crisis (Relate Live London)
Page 26: Keeping it cool in a crisis (Relate Live London)
Page 27: Keeping it cool in a crisis (Relate Live London)
Page 28: Keeping it cool in a crisis (Relate Live London)
Page 29: Keeping it cool in a crisis (Relate Live London)

WARMTH

CO

MPE

TEN

CE

Page 30: Keeping it cool in a crisis (Relate Live London)

WARMTH

CO

MPE

TEN

CE

Page 31: Keeping it cool in a crisis (Relate Live London)

Ingredients for Success

Page 32: Keeping it cool in a crisis (Relate Live London)

Planning AheadA Haiku

A Red Alert is not the time to figure out how to handle one.

Page 33: Keeping it cool in a crisis (Relate Live London)

KEEPCALM

AND

HAVE A PLANDOCUMENT IT

TRAINTAKE OWNERSHIP

AND

COMMUNICATE

Page 34: Keeping it cool in a crisis (Relate Live London)

Stay CalmEmotions are Contagious

As a leader, your emotions carry extra weight.

If you’re nervous, your team will be too.

Page 35: Keeping it cool in a crisis (Relate Live London)

EmpathizeShow You Care

Page 36: Keeping it cool in a crisis (Relate Live London)

Active ListeningHelps Defuse Anxiety

• Leave your ego behind • Frustration is normal • Let them speak • Verbal “nods” • Reflect what they say • Validate their emotions

Page 37: Keeping it cool in a crisis (Relate Live London)

“If you fail to plan, you are planning to fail” - Benjamin Franklin

Page 38: Keeping it cool in a crisis (Relate Live London)

“vaj nab luj SoH, luj nab SoH!” - Benjamin Franklin (original Klingon)

Page 39: Keeping it cool in a crisis (Relate Live London)

Red Alert PlansWhat to Include

• Process • Roles & Duties • Staffing • Tools • Communications • Special Cases • Metrics

Page 40: Keeping it cool in a crisis (Relate Live London)

StakeholdersThis isn’t just about you.

• Support Team • Engineering • Operations • Security • Marketing / PR • Sales • Account Management • Customer Success • Executives

Page 41: Keeping it cool in a crisis (Relate Live London)

Your Plan is Not Set in StoneHone & Refine

Page 42: Keeping it cool in a crisis (Relate Live London)

DocumentationIf it’s not written, it won’t be remembered

• Complete • Process & People • Clear • Accessible • Up to Date

Page 43: Keeping it cool in a crisis (Relate Live London)

TrainingPractice Makes Perfect

• Onboarding • Shadowing • Video • Drills

Page 44: Keeping it cool in a crisis (Relate Live London)

Take Ownership

• Empathize • Apologize • Don’t Shift Blame

Page 45: Keeping it cool in a crisis (Relate Live London)

Communication“All decks, this is the bridge…”

• Internal • External • Timely • Accurate • Compassionate • Honest • Transparent • One Voice

Page 46: Keeping it cool in a crisis (Relate Live London)

Red Alerts at ZendeskThe Plan in Action

Page 47: Keeping it cool in a crisis (Relate Live London)

Red Alert Process DocumentationKeep it secure, keep it safe

• Process Documentation • Checklist • Twitter & Flowdock • How to get Notified • Training Materials • On-Call Schedules

Page 48: Keeping it cool in a crisis (Relate Live London)

Zendesk Red Alert Process Overview

1. Assess the Situation 2. Alert the Incident Team 3. Communicate to Stakeholders 4. Public Acknowledgement 5. Status Updates 6. Resolve the Issue 7. Wrap Up

Page 49: Keeping it cool in a crisis (Relate Live London)

AssessmentIs This a Red Alert?

• Is there a current, ongoing, threat to the security of customer or Zendesk data?

• Are customers prevented from performing a critical Zendesk function?

• Is the event impacting multiple customers?

Page 50: Keeping it cool in a crisis (Relate Live London)

AssessmentSpecific Criteria

Critical Functionality: • Account Access • Performance Degradation • Channels • Core Partner Services • Business Rules / Routing • Agent Collision • Reporting • Apps Framework

Security Threat: • Compromised data • Spoofing • Stolen passwords • DDoS Attack

Page 51: Keeping it cool in a crisis (Relate Live London)

Internal AlertsTools and recipients

PagerDuty: Incident Response Team

(Support, Operations)

Email: Internal Stakeholders

(Engineering, Operations, Marketing)

Flowdock:Wider Team Visibility (Support, Operations)

Company-Wide Visibility

Page 52: Keeping it cool in a crisis (Relate Live London)

The Incident Response TeamThe Who’s Who

Incident Lead (Support) • Owns the problem ticket for Support • Gathers scope and impact info to share with Incident team • Updates support team on status

Support Duty Manager (Support) • Manages Support resources during the incident • Manages all customer-facing messaging

Operations Manager (Operations) • Manages Operations resources during the incident • Confirms facts about the nature of the incident • Makes decisions necessary to restore service

Incident Manager (Support Operations) • Assist with large incidents • Craft public post-mortem from internal version

Page 53: Keeping it cool in a crisis (Relate Live London)

Communication Flow

Customers

Customer Advocates

Incident Lead

Support Duty

ManagerOperations

Manager

Engineering & Operations

Staff

Page 54: Keeping it cool in a crisis (Relate Live London)

StaffingOn-Call Duty Rotation

Weekly on-call shifts for each role • 8 hours x 7 days – based in AMER, APAC, EMEA

Avoid consecutive duty shifts • Standard: one shift every 6 weeks

Support Duty Manager has on-call Backup as well • Allows for escalation when primary duty manager is unavailable

PagerDuty used for scheduling & alerts • Dashboard shows who is on call & upcoming schedule • Individuals can customizable alerts for incidents and upcoming shifts

Page 55: Keeping it cool in a crisis (Relate Live London)

The Zencident Room

Page 56: Keeping it cool in a crisis (Relate Live London)

Red Alert - Problem & Incident Ticket

Problem Ticket • Created internally (manually or by our Red Alert App) • Uses Service Disruption ticket form • Technical updates & discussions recorded here • No risk of inadvertent customer communication • Solves attached incidents automatically

Incident Tickets • Customer reports - attached to the Problem ticket • Proactive tickets attached as well

Page 57: Keeping it cool in a crisis (Relate Live London)

Service Disruption Ticket FormAdditional Fields

Added to the Problem Ticket: • Impacted Pods • Alert Duration • Number of Incidents • Link to public Help Center article • Link to internal Operations incident record • Checkbox: Post-mortem published

Page 58: Keeping it cool in a crisis (Relate Live London)

Communicating to CustomersKey Points

• Acknowledge Quickly • Take Ownership • Provide Scope and Impact • Regular Progress Updates • Be Transparent

Page 59: Keeping it cool in a crisis (Relate Live London)

Communication CadenceWhat to Share, and When

Event Time After Alert Called

Notes

Acknowledgement of Incident

ASAP but within 15 minutes

The sooner we acknowledge an incident publicly, the less anxious customers become.

Description of incident scope and impact

ASAP but within 30 minutes

Incident scope should be specific enough for customers to self-identify if they are impacted.

Status updates on investigation/resolution

Every 30 minutes thereafter

When possible, status updates should provide new information to demonstrate progress is being made toward resolution.

"All clear" ASAP when reached

As soon as Operations Manager and Support Duty Manager agree.

Pointer to Post-Mortem summary

Include with "All Clear"

Post-mortems should normally be posted within 3 business days of each incident.

Page 60: Keeping it cool in a crisis (Relate Live London)

Customer CommunicationsAdditional Details

Tweet Acknowledgement to @ZendeskOps

• Use list of sample tweets • Marketing Suspends Tweets to @Zendesk

Respond to Customer Tickets • As received • Attach to Problem ticket

Publish public Help Center article

• Use article template • Timeline and eventual post-mortem • Location: Service Disruptions section of public KB

Proactive Communication to Top Customers • Tickets & Phone Calls to affected customers

Timed Status Update Tweets to @ZendeskOps

• Default: every 15 minutes • Set expectations if it will be longer • Update Help Center article with tweets

Page 61: Keeping it cool in a crisis (Relate Live London)

“Tell them what you would want to know”

- Susan Griffin-Black, EO Products

Page 62: Keeping it cool in a crisis (Relate Live London)

Resolution & Wrap-UpOnce service is deemed restored by the incident team:

1. Ticket Resolution • Solving the Problem ticket solves all attached Incidents • Include link to Help Center article for post-mortem

2. Send “All Clear” Tweet • Include link to Help Center article for post-mortem

3. Post-Mortem (within 3 business days) • Operations team writes internal version • Support Operations Incident Manager edits for public

consumption, and publishes to Help Center article

Page 63: Keeping it cool in a crisis (Relate Live London)

Service Disruption Post-MortemWhat should it include?

• Scope and Customer Impact • Incident Duration • Communication Timeline • Process / Training Gaps • Recommendations for Improvement

Page 64: Keeping it cool in a crisis (Relate Live London)

Special SituationsExceptions to the Rule

Security Incidents Involve the Security Team

Zopim Outage Communicate via Zopim Twitter & Facebook

Partner Outage (Twilio, GoodData)

Report incident to Partner Refer to Partner’s system status page

Zendesk Support Instance is Down

Use backup voice service (IfByPhone) Outgoing communication via Tweets only

Shift Handoff Live handoffs only - no email

Overlapping Incidents Separate Red Alerts & Problem Tickets

Page 65: Keeping it cool in a crisis (Relate Live London)

The Red Alert AppCustom App in our Zendesk Instance:

1. Displays links to active Red Alerts 2. Turn a ticket into a Red Alert ticket

• PagerDuty, Ticket Form & fields 3. Send Social Media Communications

• Twitter, Facebook • Composition, templates, character counts • Salutation and closing • “All Clear” option adds link to Help Center article • Creates or updates Help Center article

4. Proactive Ticket Communications • Lists top customers, with Pod & feature info • Composition, templates, all-clear option • Creates or updates Incident tickets • Updates Yammer

5. All actions add Internal Notes to Problem ticket

Page 66: Keeping it cool in a crisis (Relate Live London)

MetricsRed Alert Impact Report (Insights)

• Number of Red Alerts • Number of attached Incidents • Total Support Handle Time • Estimated Support Cost ($/ticket) • Number Proactive Tickets • Top Red Alerts (by # of incidents) • Red Alerts by “About” field • Customer Satisfaction over time

• Process Scorecard

Page 67: Keeping it cool in a crisis (Relate Live London)

The Future!Where No One Has Gone Before

System Status Page (in Beta) • Loads more detail! Improved Red Alert App • More capabilities Wider Proactive Communication • More channels • More customers Incident Management as a Job • Not just a role

Page 68: Keeping it cool in a crisis (Relate Live London)

Conclusion

Page 69: Keeping it cool in a crisis (Relate Live London)

WARMTH

CO

MPE

TEN

CE

Page 70: Keeping it cool in a crisis (Relate Live London)

KEEPCALM

AND

PLAN AHEADDOCUMENT IT

TRAINTAKE OWNERSHIP

AND

COMMUNICATE

Page 71: Keeping it cool in a crisis (Relate Live London)

Smooth Sailing“Second star to the right, and straight on ‘till morning…”

Page 72: Keeping it cool in a crisis (Relate Live London)
Page 73: Keeping it cool in a crisis (Relate Live London)

#RelateLive

Q & A