Conf2013 bchristensen thebig_t

56
Copyright ©2013 Ping Identity Corporation. All rights reserved.

Transcript of Conf2013 bchristensen thebig_t

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Infrastructure Operations

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Monitoring for the big T.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

About Me

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Roy Christensen: 35 years3C > HoneyWell > Digital Equipment CorpMike Christensen: 40 yearsDEC > Compaq> Hewlett Packard

Beau Christensen: 17 yearsAstraZeneca > Send.com > DirecTV > Ping Identity

3rd Generation Computer Geek

92 years family experience (hah)

Copyright ©2013 Ping Identity Corporation. All rights reserved.

• We believe secure professional and personal identities underlie human progress in a connected world. Our purpose is to enable and protect identity, defend privacy and secure the Internet.

• Over 1,000 companies, including over half of the Fortune 100, rely on our award-winning products to make the digital world a better experience for hundreds of millions of people.

• Denver, Colorado. Est. 2003

About Ping Identity

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Site Reliability Engineering (SRE)Production Web OperationsOps to On-Demand Dev

Configuration Engineering (CFE)Automation, Deployment, LabsOps to On-Premise Dev

Infrastructure Engineering (IFE)Iron, Network, SecurityOps to Support & Helpdesk

Infrastructure Operations @Ping

Copyright ©2013 Ping Identity Corporation. All rights reserved.

(Quickie) Architecture

• Hybrid Cloud Application• VMware/AWS• SOA (lots of little services)• Red/Black Deployment• Cassandra & Galera MySQL

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Splunk Systems

• 7 Indexers • Distro search across 3 regional

data centers• ~ 40Gib/day• No clustering (yet)• Universal FWD installed in all

templates• Moving all Splunk to clustered

architecture in VPCs.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Trust(the big T)

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Because uptime is dead.

http://www.kitchensoap.com/2013/01/03/availability-nuance-as-a-service/

Copyright ©2013 Ping Identity Corporation. All rights reserved.http://uptime.pingidentity.com

Copyright ©2013 Ping Identity Corporation. All rights reserved.http://uptime.pingidentity.com

Copyright ©2013 Ping Identity Corporation. All rights reserved.

What Builds Trust?

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Lots of Things…

Honesty

Integrity

ReliabilityDependabilitySecurity

TransparencyMaturity

Copyright ©2013 Ping Identity Corporation. All rights reserved.

We rolled those up into two.

Reliability & Transparency

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Reliability:Keep Shit Running

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Keeping Shit Running• Architecture

– SOA (duh!!!!)– Highly Automated Deployments– Active/Active, quick failover, multi-region

• Tools– We use a ton of tools– We constantly question the tools we use– We are always looking for new tools

• Security– Eyes always on– Scanners running continuously– Constant Remediation

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Transparency:Talk About Shit

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Talking About Shit• Talk Publicly

– Talk about decisions we make– Talk about architecture– Talk to vendors– Talk to customers

• When Stuff Breaks, Talk More– Quick & dirty status updates– Verbose post mortems– Be honest

• Metrics– Use 3rd Parties– Expose as much monitoring as you can– Make it relevant to customers

Copyright ©2013 Ping Identity Corporation. All rights reserved.

T = r + t

Trust = reliability + transparency

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Trust is a Zero Sum Game

Trust

Time

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Trust

Time

T = +2

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Trust

Time

T = -2

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Trust

TimeT = 0

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Monitoring is the foundation

of Trust.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

is the backbone of our monitoring

strategy.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

<3 MonitoringThe right tool for the right job = Lots of tools.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Lots of tools = Best of Breed, orStack Monitoring

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Let’s build the stack.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Transport & Network Layer.

Fundamental interactions between services. Extremely important to understand, forms the basis of troubleshooting efforts.

“Can you ping it? Tracert? Tcpdump?”

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Instance Resource MonitoringHistorical resource consumption, collected and correlated with incidents and system events. Scalability intelligence.

“Traditional” systems monitoring. Nagios, Zenoss, Zabbix, etc.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Machine Data, Logs.

Extraordinary versatility: events, alerting, reporting, security, business metrics, and OI. Allows the creation of dashboards and event types relative to your own environment.

Events, logs, traditional “syslog stuff.”

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Application Performance Monitoring

Detailed view of how the applications are running on top of the rest of the stack. Identify bottlenecks, architecture issues, and accurately model user experience with RUM.

“Page loads are slow.” “Why are there SQL queries that return 30,000 rows?”

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Availability Monitoring

External agents monitor services from multiple locations around the globe. Use application heartbeats, not ICMP or TCP. Provides near real-time alerting and is immediately visible to the customer.

“The site is down.”

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Splunk is the system’s mortar.

Binding systems together, filling gaps, creating stability.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Security

Quick visual impact. Easy to identify location, frequency, and network address of aggressor. Quick drill down into detailed Splunk searches.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Traffic

Real-time traffic maps are immediately recognizable to everyone in the organization.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Operations

Operational dashboards provide a quick overview of system health by displaying error correlation with traffic levels and event type heat maps.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Global Load Balance

Shows traffic distribution between production data centers.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Maximize Machine Data Layer VersatilityBuild your own dashboards (it’s easy).

Maintain proper naming conventions:na-den-www-13-8-17-0ec5e8da-128-193

Design for wetware.

Ask for feedback.

Maintain event types.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Having tons of tools is great. Each layer of the stack is matched well with the system’s requirements…

Aggregation of metrics is the new challenge. How do you see it all?

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Stack Integration

Copyright ©2013 Ping Identity Corporation. All rights reserved.

eventtype=critical.*

Application integration allows speedier access to event timeline and context of the alarm using markers. Allows SRE to gather relevant information quicker, using the same tools with less screens.

Stack Integration = MTTR Speed

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Splunk has a TON of integrations.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

See? -->

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Always Need More.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Remember…

T = r + t

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Keep Shit Running.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Talk About Shit.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Use Splunk.

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Questions?

Copyright ©2013 Ping Identity Corporation. All rights reserved.

Thanks!

[email protected]@beauchristensenwww.pingidentity.com/blogshttp://status.pingidentity.com