Cloudpocalypse We put “fail” in failover Vlad Mazek, MCSE CEO, Own Web Now Corp...

Post on 04-Jan-2016

212 views 0 download

Transcript of Cloudpocalypse We put “fail” in failover Vlad Mazek, MCSE CEO, Own Web Now Corp...

CloudpocalypseWe put “fail” in failover

Vlad Mazek, MCSECEO, Own Web Now Corp

vlad@ownwebnow.comfacebook.com/vladmmd

@vladmazekCell: (407) 536-VLAD

Agenda

• Summary of events• What to tell your clients about the outage• Our current network design• What failed?• What we are doing to address it

Power Infrastructure

So what failed?

ATSAutomatic Transfer Switch

Electrical switch that reconnects electric power source from it’sprimary source to a standbysource.

Summary of Events

• 12:04 Power failure • 1:34 ATS replacement advised by DC• 2:00 Partial power restored• 4:10 First ETA issued, 6:30 PM• 4:30 Emergency systems start coming online• 4:46 DC offers additional details on the problem• 5:10 Restored Exchange 2010 clusters• 7:10 DC restores power

How this really felt

How this really felt

How this really felt

How this really felt

How this really felt

Impact

• This is the first major issue with the Dallas DC in over a decade

• We moved our critical systems to Dallas from California and Florida due to the weather and power issues

• This has adjusted our roadmap for service delivery

Agenda

• Extend LiveArchive to a second DC• Extend Exchange 2010 hosting to additional

data centers• Improve our communications across partner

networks– Facebook: ExchangeDefender– Twitter: @xdnoc @ExchangDefender

What can I tell my clients?

• Power issues happen.• There will be a partial refund.• There is no additional support cost.• The company is going to improve the solution.• The uptime record thus far has been impressive.• Complex systems lead to complex problems and

aren’t you glad you don’t have to worry about it?

What next?

• Look for an email from me in the morning.• Advise customers about LiveArchive.• Stay tuned for network enhancements.• Keep the issue in perspective: This isn’t

Microsoft’s fault or general negligence/incompetence, it’s a massive failure.

Something funny…

You know why I don’t trust the cloud?It’s still powered by guys who’s butt cracks show when they squat to fix an electrical issue.