Sustainable Logging – SplunkLive! 2014

60
Copyright © 2014 Splunk Inc. Sustainable Logging: SUCCEEDING WITH SPLUNK

description

There are several factors that will make your Splunk implementation a success. This presentation covers why our organisation implemented Splunk for log management and the steps you can take to make your implementation successful.

Transcript of Sustainable Logging – SplunkLive! 2014

Page 1: Sustainable Logging – SplunkLive! 2014

Copyright © 2014 Splunk Inc.

Sustainable Logging: SUCCEEDING WITH SPLUNK

Page 2: Sustainable Logging – SplunkLive! 2014

2

Paul Gilowey Foundation Technology Specialist

[email protected]

@paulcgt

Sustainable Logging: SUCCEEDING WITH SPLUNK

Words and thoughts expressed herein are my own, and not those of Santam.

Page 3: Sustainable Logging – SplunkLive! 2014

3

ww

w.d

an-d

are.

org

Page 4: Sustainable Logging – SplunkLive! 2014

4

My technology background

Page 5: Sustainable Logging – SplunkLive! 2014

5

The evolution that led to Splunk

Page 6: Sustainable Logging – SplunkLive! 2014

6

In the beginning there was ONE.

depotwallpaper.com

Page 7: Sustainable Logging – SplunkLive! 2014

7

Then things got really complex.

Page 8: Sustainable Logging – SplunkLive! 2014

8

Page 9: Sustainable Logging – SplunkLive! 2014

9

Page 10: Sustainable Logging – SplunkLive! 2014

10

In 2012, a new project

Page 11: Sustainable Logging – SplunkLive! 2014

11

A big decision

It’s time to say goodbye…

Page 12: Sustainable Logging – SplunkLive! 2014

12

Highly distributed and integrated

Page 13: Sustainable Logging – SplunkLive! 2014

13

A brand new world

Claims Finance Docs B2B Portal Legacy

Reverse Proxies

Load-balancers IDM Integration ESM Virtualisation

New Policy Administration

MDM

Page 14: Sustainable Logging – SplunkLive! 2014

14

James Wheeler souvenirpixels.com

Too many logs to monitor

Page 15: Sustainable Logging – SplunkLive! 2014

15 capetownstockphotos.com

So little time to trace problems

Page 16: Sustainable Logging – SplunkLive! 2014

16

Not only in production

https://www.flickr.com/photos/wsdot/

Page 17: Sustainable Logging – SplunkLive! 2014

17

On a tight timeline

Page 18: Sustainable Logging – SplunkLive! 2014

18 https://www.flickr.com/photos/usnavy/

December 2013 Production and Non-Production

20GB

Page 19: Sustainable Logging – SplunkLive! 2014

19

Now what?

So we’re collecting log events.

Page 20: Sustainable Logging – SplunkLive! 2014

20

Developers like doing things the old way

Page 21: Sustainable Logging – SplunkLive! 2014

21

tail -f ./catalina.out

Page 22: Sustainable Logging – SplunkLive! 2014

22

We like this. It’s comforting.

Page 23: Sustainable Logging – SplunkLive! 2014

23

Effecting change

Page 24: Sustainable Logging – SplunkLive! 2014

24

CTO’s Office

Splunk users (dev, ops, etc.)

Choosing your champion

Page 25: Sustainable Logging – SplunkLive! 2014

25

• have influence across departments

• act as product owner

• be fanatical

• be hands-on

• have a development background

• be an architect

Dave Keeshan - https://www.flickr.com/photos/spudmurphy/

Your champion should…

Page 26: Sustainable Logging – SplunkLive! 2014

26

Tips to help your champion

Page 27: Sustainable Logging – SplunkLive! 2014

27

Help developers

troubleshoot (even in dev)

Ed Yordon https://www.flickr.com/photos/yourdon/

Page 28: Sustainable Logging – SplunkLive! 2014

28

Change how developers think

about log events

Page 29: Sustainable Logging – SplunkLive! 2014

29

Police

lazy logging

[INFO ] Got here

[INFO ] finished loop 420

[INFO ] JDE…

[INFO ] >>>>>>>>AAAAAAAA

[INFO ] BBBBBBBBBBBBBBB

[ERROR] It failed!!!!!!

Page 30: Sustainable Logging – SplunkLive! 2014

30

Ops might as well be blindfolded.

https://www.flickr.com/photos/foxtongue

Page 31: Sustainable Logging – SplunkLive! 2014

31

Do you really want to be called at 2am?

Page 32: Sustainable Logging – SplunkLive! 2014

32

Demonstrate thoughtful logging

[DEBUG] TxId=328, Counting invoice line items…

[INFO ] TxId=328, Invoice LineItemsTotal=420

[DEBUG] TxId=328, Calling remote service JDE…

[TRACE] TxId=328, JDE Request: {“TxID”:”328”,

“Items”[{“desc”:”Motor Vehicle”,”prem”:305.24},…

[WARN ] TxId=328, Timed out while calling remote service

JDE… target system may be down. Will retry in 30s.

Page 33: Sustainable Logging – SplunkLive! 2014

33

Show the benefit of structured log events

[INFO] Purchase complete - total=42 currency=ZAR language=en_ZA priority=13

“Purchase complete” priority<4 |

stats sum(total) as currencyTotal by currency |

table currency, currencyTotal

Page 34: Sustainable Logging – SplunkLive! 2014

34

11 Sep 2014 15:05:27,960 [Thread-428] [DEBUG] [stm.amx.communication.outboundcommunicationmanager] za.co.santam.communication.outboundcommunicationmanager.RunnableStatusReceiver - btid=77320d33-5f8c-4178-b13e-c594816463d8, cmpid=za.co.santam.communication.outboundcommunicationmanager.RunnableStatusReceiver, uid=System, za.co.santam.communication.outboundcommunicationmanager.RunnableStatusReceiver.processStatusMessage : Status [STATUS_PROCESSING_COMPLETED = 6], will act on [STATUS_FINISHED = 1], for now only GENERATE_DIGITAL_DOCUMENT.

11 Sep 2014 15:05:36,272 [Thread-428] [DEBUG] [stm.amx.communication.outboundcommunicationmanager] za.co.santam.communication.outboundcommunicationmanager.RunnableReceiver - btid=e76665e2-e876-455a-a087-aeb5ba97d5a8, cmpid=za.co.santam.communication.outboundcommunicationmanager.RunnableStatusReceiver, uid=System, za.co.santam.communication.outboundcommunicationmanager.RunnableStatusReceiver.processMessages : Blocking(2000) read storage until message arrives...

11 Sep 2014 15:05:36,472 [Thread-427] [DEBUG] [stm.amx.communication.outboundcommunicationmanager] za.co.santam.communication.outboundcommunicationmanager.RunnableReceiver - btid=e76665e2-e876-455a-a087-aeb5ba97d5a8, cmpid=za.co.santam.communication.outboundcommunicationmanager.RunnableStorageReceiver, uid=System, za.co.santam.communication.outboundcommunicationmanager.RunnableStorageReceiver.processMessages : message received.

11 Sep 2014 15:05:36,475 [Thread-427] [TRACE] [com.tibco.amx.platform] com.tibco.governance.amxagent.msginterceptor.component.AMXGovMsgInterceptorComponent - Target URI : urn:amx:env2/stm.amx.communication.outboundcommunicationmanager/StatusReceiver_1.2.0.v2014-09-10-1604#reference(StatusReceiver_ContentManagerProxyAsync_v4_Int).

Change this…

Page 35: Sustainable Logging – SplunkLive! 2014

35

… into this.

Page 36: Sustainable Logging – SplunkLive! 2014

36

Formalise stacktrace logging policy

Function call ->

Function call ->

Function call ->

Function call

<- Log stacktrace

<- Log stacktrace

<- Log stacktrace

<- Log stacktrace

Page 37: Sustainable Logging – SplunkLive! 2014

37

Avoid filtering events.

[DEBUG] TxId=328, Real important debug statement.

[INFO ] TxId=328, This would have been useful to see...

[DEBUG] TxId=328, Useful when we really need it.

[TRACE] TxId=328, Oh man, I need this event so bad.

[DEBUG] TxId=328, Flippin’ important debug message.

[INFO ] TxId=328, This would have been useful to see...

[WARN ] TxId=328, Why am I logging at all?

Page 38: Sustainable Logging – SplunkLive! 2014

38

Avoid filtering events.

[WARN ] TxId=328, Real important debug statement.

[WARN ] TxId=328, This would have been useful to see...

[WARN ] TxId=328, Useful when we really need it.

[WARN ] TxId=328, Oh man, I need this event so bad.

[WARN ] TxId=328, Flippin’ important debug message.

[WARN ] TxId=328, Cummon, I *really* wanna see this!

[WARN ] TxId=328, Why am I logging at all?

Page 39: Sustainable Logging – SplunkLive! 2014

39

tail -f ./catalina.out

Page 40: Sustainable Logging – SplunkLive! 2014

40

Why developer buy-in matters

Page 41: Sustainable Logging – SplunkLive! 2014

41

“A fool with a tool is still a fool.” Grady Booch

Page 42: Sustainable Logging – SplunkLive! 2014

42

• Laughable deadlines

• Long days, longer nights

• Management pressure

Page 43: Sustainable Logging – SplunkLive! 2014

43

If we log excessively…

Page 44: Sustainable Logging – SplunkLive! 2014

44

Bob B. Brown - https://www.flickr.com/photos/beleaveme

Page 45: Sustainable Logging – SplunkLive! 2014

45

tail -f ./catalina.out

Page 46: Sustainable Logging – SplunkLive! 2014

46

Nope, no fires today, folks.

Robert du Bois https://www.flickr.com/photos/lordisgood

Page 47: Sustainable Logging – SplunkLive! 2014

47

No value, no money.

Neubie - https://www.flickr.com/photos/neubie/

Page 48: Sustainable Logging – SplunkLive! 2014

48

Shelfware.

Robert Couse-Baker https://www.flickr.com/photos/29233640@N07/

Page 49: Sustainable Logging – SplunkLive! 2014

49

8 steps to successful implementation

Page 50: Sustainable Logging – SplunkLive! 2014

50

Start small (but plan to grow big)

Pewstruck.com - https://www.flickr.com/photos/canoodlepets/

1

Page 51: Sustainable Logging – SplunkLive! 2014

51

Start with a

clean slate

2

Page 52: Sustainable Logging – SplunkLive! 2014

52

Learn Implement Stabilise Spread the

word Refine

Take a

smart approach

3

Page 53: Sustainable Logging – SplunkLive! 2014

53

Dashboards are pretty, alerts are king

Reactive becomes proactive

Register defects (ERROR = defect)

Filter, don’t flood mailboxes

Build alerts

and

set policy

4

Page 54: Sustainable Logging – SplunkLive! 2014

54

Get a feel for the pain

Make sure filtering is working

Police false positives

Receive

all alerts

yourself

5

Page 55: Sustainable Logging – SplunkLive! 2014

55

Mine their data yourself – Find what’s difficult to show – Build dashboards to showcase their solutions

Broaden their minds – complement traditional BI by using log events

Help

managers

look good

6

Page 56: Sustainable Logging – SplunkLive! 2014

56

“Not too hot, not too cold, just right!”

“Meh – too sloooow…”

“Too expensive!”

Apply the Goldilocks Principle 7

Page 57: Sustainable Logging – SplunkLive! 2014

57

Monitor licence usage by source or source type

index=_internal source=*metrics.log

group="per_sourcetype_thruput"

| stats sum(kb) as KB by series

| where KB > 20000

8

Page 58: Sustainable Logging – SplunkLive! 2014

58

Wrapping up

Page 59: Sustainable Logging – SplunkLive! 2014

59

Encourage thoughtful logging

Promote good logging practices

Police bad behaviour

Be intimately involved

Adopt a helpful attitude

Make sure you show value

To be successful:

Page 60: Sustainable Logging – SplunkLive! 2014

Thanks for listening!

Paul Gilowey Foundation Technology Specialist

[email protected]

@paulcgt