1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now...
-
Upload
alexia-greer -
Category
Documents
-
view
216 -
download
2
Transcript of 1 Mean Time to Innocence Your Dashboards are Green – but your end users are still complaining. Now...
1
Mean Time to Innocence
Your Dashboards are Green – but your end users are still complaining. Now What?
Phil StanhopeOctober 2015
22
30B Real-Time Steering Decisions per day
6B trace route and RUM latency measurements per day
That’s over 6 Light years!
13 Hops per traceroute
Traffic covering 80% of ASNs on the internet seen every few minutes
52K ASN monitored
200M BGP updates per day
No major CDN can deliver 99.9 uptime – from the end users perspective. But is it fault.
Real Time Feeds
Cooked Time Series Data – Near Real Time
Pre-Cooked across ~1000 dimensions every 5 minutes (Geography, Mobile Network, Fixed Line Networks, Target Markets Cities and IPSets)
Outages & Hijacks
Pairwise Comparisons
Performance Alarms
Some Numbers
3
● Major Outages • Major Impact• Rare
● Regional Outages and Degradations
• Variable Impact• Always Happening
“We experienced an Internet connectivity issue with a provider outside of our network which affected traffic from some end-user networks.” AWS
Business Impacting
3
4
● Consolidated view across your Internet Infrastructure
● Determine the impact to Cloud, CDN and Hosting Infrastructure globally
● Immediate time to information
What is Internet Intelligence?
4
5
Leverage Currently Deployed Dyn Assets
● Global Monitoring Infrastructure
● Custom Cloud Monitoring Infrastructure
● Real User Monitoring data
● Global Routing Infrastructure Monitors
How is it Done?
5
6
Global Monitoring Infrastructure
6
77
Reachability Markets
88
What is being Monitored?
99
Waterfalls & RUM – Where do you start?
1010
Rather than focus on entire page RUM and waterfall – focus on what happens OUTSIDE of normal your span of control as a cloud, content & security consumer:
Monitor the critical content servers (CDNs both public and private)
Monitor the cloud providers, DNS providers & core SaaS providers
Give you the tooling to get to start answering mean time to innocence questions
Is it a problem you have ability to address? Not if it’s your cloud provider’s transit. Or the ISPs recursive DNS.
Is your CDN provider overloaded? Is there a more generalized congestion problem on the internet?
Are the network paths to your users suboptimal – maybe even hijacked?
Can you see a micro-outage? Can you see patterns with providers?
Did a user come via a proxy gateway? Does the gateway fail to forward websockets?
Let’s Dive in – Some Context
1111
NOTE: This is a fake URL – it won’t work for you. Sorry.
A single web page that shows combination of real-time and near-real-time forensic data
Intentionally unbranded – what can you do with our datasets?
Covers the internal APIs that we use – they are all becoming public. Talk to me!
Common set of UX controls can be used to a variety of real-time and batch data:
GeoViews, Sunburst, Matrix & Long-Term Trending
Under the covers: ReactJS, D3, GeoJson/Topojson, jQuery, Go, Varnish, Nginx, Websockets
Live Demo
1212
Telemetry Data Cooking Pipeline
Users
Cover 80%Of the ASNs
On the internetEvery
minute
Relays
Globally distributed network. Handling
50K/sec per relayORI
GINS
DNS
RECURSIVES
ProbersNetwork of
300+ probers
performing10K
traces/second AND
synthetic DIG &
HTTP[S]
Geo annotated
real-time API
Time Series analyzed API
Gatherers
Real-time geo annotation,
data transformation
& filtering.Handling 100K/sec events
Cookers
Statistical analysis and aggregation
services
13
Browser Recursive AuthoritiveInjector &Beacon
GET - http://dyninsight.com/inject/CUST_ID/CUST_DATA
beacon = HMAC(secret, token)Javascript “injection” – just like injecting an advertisement into a page
Writes a transparent iframe into the pageLoading the iframe requires resolving beaconGuaranteed to cause recursive DNS cache miss
time, client_ip, beacon
time, recursive_ip,beacon
HTTP DNS LOG & ANALYZE
Collect
GET - http://beacon.dyninsight.com/CUST_ID/CUST_DATA/token
time, client_ipDynamic HTML - containing customer resources to test
Resources 1 .. N @ target origins tested
Resource timing Information sent to collector Per resource timing info
1
2
3
4
5
KEY:
Gatherer
token = encode(cust_id, client_ip, time, nodeid, referer)
Time – 2 - AuthoritativeTime – 2 – Recursive (inferred)
1414
Aggregated @ 5min, 1H & 1D
Cooking – What’s going on in our Data Kitchen?
MHD
Raw MHD formatted data at one minute
granularity
Client IP STATSHistograms
5 minute timing histograms
across 6 latency features
DNS IP STATSHistograms
5 minute timing histograms
across 6 latency features
IP MapsClient
Recursive
Recursive Client
Client IP SetsTyped Label IP Sets
LatenciesCountry
CityContinent
ASN
DNS IP Sets
Typed Label IP Sets
LatenciesCountry
CityContinent
ASN
Correlation
Scores and Ranks
Daily by Origin for every TYLIP
feature
All data is GEO RedundantGathering, Raw, Intermediates & Aggregates
Geo annotated
real-time API
Gatherers
15
QUESTIONS?