Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

48
Splunk Company Overview Company (NASDAQ: SPLK) Founded 2004, first software release in 2006 HQ: San Francisco / Region HQ: London, Hong Kong Over 600 employees, based in 10 countries FY 12 Revenue: $121MM; FY 13 Guidance: $183MM Q2 FY 13 Revenue: $44.5 million Business Model / Products Free download to massive scale Software deployed on-premise and in the cloud; Splunk Storm delivered via a SaaS model 4,400+ Customers Customers in over 80 countries 54 of the Fortune 100 1

description

A presentation titled "Putting Data to Work by Splunking All the Things at Target" that Dan Cundiff from Target Corporation and Leena Joshi from Splunk gave at Gartner AADI 2012.

Transcript of Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Page 1: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Splunk Company OverviewCompany (NASDAQ: SPLK)

Founded 2004, first software release in 2006HQ: San Francisco / Region HQ: London, Hong KongOver 600 employees, based in 10 countriesFY 12 Revenue: $121MM; FY 13 Guidance: $183MM– Q2 FY 13 Revenue: $44.5 million

Business Model / ProductsFree download to massive scaleSoftware deployed on-premise and in the cloud; Splunk Storm delivered via a SaaS model

4,400+ CustomersCustomers in over 80 countries54 of the Fortune 100Largest license: 100 Terabytes per day

1

Page 2: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Copyright © 2012 Splunk, Inc.

Target Turns Machine Data into Application Intelligence

Leena Joshi, SplunkDan Cundiff, Target Corporation

Page 3: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Agenda

• Splunk Overview• The machine data opportunity

• Splunk At Target• Why Target chose Splunk• Results with Splunk• Best Practice Advice

3

Page 4: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Turn Machine Data into Application Intelligence

Page 5: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Spelunking:

Splunking:

to explore underground caves

to explore and visualize large amounts of machine data

Splunk

5

Page 6: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Make machine data accessible, usable and valuable to everyone.

Mission

6

Page 7: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Customer Facing Data

Outside the Datacenter

ApplicationsWeb logsLog4J, JMS, JMX.NET eventsCode and scripts

NetworkingConfigurationssyslogSNMPnetflow

DatabasesConfigurationsAudit/query logsTablesSchemas

Virtualization & Cloud

HypervisorGuest OS, AppsCloud

Linux/UnixConfigurationssyslogFile systemps, iostat, top

WindowsRegistryEvent logsFile systemsysinternals

Logfiles Configs Messages Traps Alerts

Metrics Scripts TicketsChanges

Click-stream dataShopping cart dataOnline transaction data

Manufacturing, logistics…CDRs & IPDRsPower consumptionRFID dataGPS data

Splunk Collects and Indexes Any Machine Data

7

Page 8: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Splunk Collects and Indexes Any Machine Data

8

Customer Facing Data

Outside the Datacenter

ApplicationsWeb logsLog4J, JMS, JMX.NET eventsCode and scripts

NetworkingConfigurationssyslogSNMPnetflow

DatabasesConfigurationsAudit/query logsTablesSchemas

Virtualization & Cloud

HypervisorGuest OS, AppsCloud

Linux/UnixConfigurationssyslogFile systemps, iostat, top

WindowsRegistryEvent logsFile systemsysinternals

Logfiles Configs Messages Traps Alerts

Metrics Scripts TicketsChanges

Click-stream dataShopping cart dataOnline transaction data

Manufacturing, logistics…CDRs & IPDRsPower consumptionRFID dataGPS data

No upfront schemaNo custom connectorsNo RDBMSNo need to filter/forward

• Any amount, any location, any source.

Page 9: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Turning Machine Data into Operational Intelligence

Report and analyze

Custom dashboards

Monitor and alert

Ad hoc search

Real-time

Collection and Indexing

DeveloperPlatform

9

Integrated Collection, Storage and Visualization.

Page 10: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Turning Machine Data into Operational Intelligence

10

Business InsightsGain real-time insight from your machine

data to make better-informed business decisions.

Operational VisibilityGain operational visibility to make

better-informed IT decisions.

Proactive MonitoringMonitor infrastructure to identify issues, problems and attacks before they impact

your customers and services.

Search and InvestigationFind and fix problems across the organization using machine data.

Machine Data Operational IntelligenceIntegrated Collection, Storage and Visualization.

Page 11: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Enabling Application Intelligence for Dev & Production

End user devices

Storage

Messaging

Servers

Legacy Systems

Databases

Virtualization

WebServices

App Servers

Networking/Loadbalancing

Networking/Loadbalancing

Networking/Loadbalancing

SecurityEnd user devices

End user devices

11

Talks to every technology in your stack

Correlates data across the different tiers – find causal links

Built for Big Data - Visualize, analyze, trend all your data at scale

Page 12: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Operational Intelligence Across Use Cases

ITOps Security ComplianceApplication

ManagementWeb

IntelligenceBusiness Analytics

12

Internet of Things

DEVELOPER FRAMEWORK

Page 13: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Broad Adoption Across 4,400+ CustomersOver Half the Fortune 100

Cloud and Online Services

Cloud and Online Services

Education

Cloud and Online Services

Energy and Utilities

Cloud and Online Services

Financial Services & Insurance

Cloud and Online Services

Government

Cloud and Online Services

Manufacturing

Cloud and Online Services

Media & Entertainment

Cloud and Online ServicesCloud and Online Services

Healthcare

Travel and Leisure

Cloud and Online Services

Retail

Cloud and Online Services

Telecommunications

Cloud and Online Services

Technology

Cloud and Online Services

13

Page 14: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Putting Data to Work by Splunking All the Things at Target Dan Cundiff, Target Corporation

Page 15: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Target Corporation

15

Page 16: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

About MeTechnical Architect 7+ years development experience working across several groups: security, social media and knowledge management, and service oriented architecturesCurrently focused on API development, creating RESTful APIs that are used in and outside of the enterprise across a wide range of devices, applications, and business partnersEnjoy automating - all the things - exchanging pro tips on continuous integration and deployment

@pmotch16

Page 17: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Context: Enterprise Services @ TargetData and transactional APIs for all the domains in our business– Products (inventory, price, description, etc)– Locations– Coupons– etc

APIs exposed inside and outsideMostly RESTful APIs, some pub sub/messagingUsed by mobile devices, applications, partners on the outside, etc.Constantly evolving, rapidly improving, all the time

17

Page 18: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Part Problem. Part Opportunity.First API go-live:– Millions of log events per day (grep/cut/sed/awk not cutting it)– Logs scattered everywhere– Limited access to logs– Needed end to end visibility of web services– Needed ability to discover information in logs– Can we be pro-active? Faster reactive?

Looming horizon:– BILLIONS of log events coming– Questions changing everyday from business, support, execs, developers

18

Page 19: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Solution. Gave Splunk a Try.Installed Splunk on a lab serverHooked up Splunk to the logsQuickly created 15+ searches and reportsGenerated a dashboard for visibility and trendingTotal time to do all this in Splunk:

~4 hours19

Page 20: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Why Splunk?

20

Find What We Don’t Know

• Understand “Normal”• Actionable

events• Identify

tolerances • Find things we

didn’t know existed

Proactive

• Indicators of outliers, anomalies, percentage changes, standard deviations

Full Stack Visibility

• API gateway• Network (load

balancers, firewalls)

• Web/app• OS• Quick and flexible

dashboards• Drilldown

Community!

• Community (Splunkbase, blogs, etc)

• Google-able™ • App store!

Page 21: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Splunk delivers us a new type of intelligence.

21

Page 22: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Understanding “Normal”

22

API response time SLAs Error code by proportion

Overall volume of requests

Error code by volume

All the data in one place allows us to track multiple indicators of “Normal”

Page 23: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Better Understand Consumers

23

Who and how is it being used?What’s their experience?

Page 24: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Better Understand Consumers, Part 2

24

Load testing in production?

Page 25: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Understanding Our Infrastructure

Expected design vs actual implementationNot balancing workload as expected

25

Page 26: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Understanding Providers

How are providers responding?Is overhead added to the API response?

26

Page 27: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Requirements Feedback Loop

Requirement: 200 tpsActual: ~20 tps

27

Page 28: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Real-time Intelligence from APIs

Where are people searching?Where should we build our next store(s)?How far are people traveling?What time of day?Mobile vs website?iOS vs Android?International?

28

Page 29: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Metrics for APIs(source: http://blog.programmableweb.com/2012/08/02/the-api-measurement-secret-know-what-metrics-matter/)

Traffic Metrics– Total calls– Top methods– Call chains– Quota faults

Developer Metrics– Total developer count– # of active developers– Top developers– Trending apps– Retention

Service Metrics– Performance– Availability– Error rates– Code defects

Marketing Metrics– Developer registrations– Developer portal

funnel– Traffic sources– Event metrics

Support Metrics– Support tickets– Response time– Community metrics

Business Metrics– Direct revenue– Indirect revenue– Market share– Costs

29

Page 30: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

In progress and future stuff.

30

Page 31: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Splunking all the Things

Consumer appsProvider systemsOS, firewalls, proxiesExternal API gateway logsAnything in between (middleware, integrations, etc)Correlate with logs from apps degrees away (e.g. .com web logs)Development (perf test results, git, Jenkins/CI, wiki, etc)

Page 32: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Dashboards

Global dashboard summarizing all APIsBI dashboardsExecutive dashboards

32

Custom dashboards for different roles brings right information to appropriate fingertips

Page 33: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Dashboards, Part 2

Environment dashboards for each API– CI– Test– Stage– Prod

33

Page 34: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Dashboards, Part 3

Alert trending dashboards for each API

34

Page 35: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Splunking Continuous Integration

Drill down into CI results linked straight from Jenkins– Filtered by date OR transaction GUID

35

Page 36: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Splunking Continuous Integration, Part 2

We practice code as documentationEvery commit, Jenkins runs, extracts documentation from code, puts it in the respective wiki pages (pretty cool! – automated / no humans)Splunk monitors wiki changes using the MediaWiki APIMonitor CI + human wiki changes

https://github.com/pmotch/wikislurp

36

Page 37: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Common Logging Service

CLS is our strategy for getting logs from all places into SplunkHow– Use UFs on end points everywhere– Else, consolidate and mount Splunk– Else, use CLS RESTful API

Enables end-to-end visibility– Insert GUIDs across all the hops in the transaction

Use out of the box log formats (e.g. Log4j)

37

Page 38: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Best Practice Advice

38

Page 39: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Lessons RTFM– Keep logs flat– Keep timestamp (ISO8601) at the beginning– k=v

Iterate quick, push to prod; minimal tweaks to SplunkFlatten out of box audit events (XML)– Toggle at runtime

Don’t re-invent the wheel, use what your system provides, Splunk can handle it!

39

Page 40: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Lessons, Part 2 Don’t pre-optimize up front– Governance– Standards– Alerting– Access controls

Optimize as needed

40

Page 41: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Lessons, Part 3Create a community

41

Page 42: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Lessons, Part 4Create best practices, standards, etc in a wiki

42

Page 43: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Challenges: Organizational“Stop. We already have tools that do this. Use those.”– tgtMAKE saves the day– tgtMAKE = R&D– R&D = $, servers, flak shelter, people network

Make it real strategy– Demo to as many key players as possible– Drum up interested– Show actual value

43

Page 44: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Challenges: Organizational, Part 2The data can’t be trusted?

44

Page 45: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Recap

Be bold. Tooling matters. Sell it.Splunk all the things!

Iterate, adapt, change quickly.

45

Page 46: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

We’re hiring

(come talk to me)

46

Page 47: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Resources

Speaker emails: dan.cundiff AT target.com, ljoshi AT splunk.comSplunk download: www.splunk.com/goto/downloadSplunk Storm SaaS Service: www.splunkstorm.com/

47

Page 48: Putting Data to Work by Splunking All the Things at Target - Gartner AADI 2012

Thank You