Disclaimer
2
During the course of this presenta@on, we may make forward looking statements regarding future events or the expected performance of the company. We cau@on you that such statements reflect our current expecta@ons and
es@mates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements,
please review our filings with the SEC. The forward-‐looking statements made in the this presenta@on are being made as of the @me and date of its live presenta@on. If reviewed aSer its live presenta@on, this presenta@on may not contain current or accurate informa@on. We do not assume any obliga@on to update any forward looking statements we may make. In addi@on, any informa@on about our roadmap outlines our general product direc@on and is subject to change at any @me without no@ce. It is for informa@onal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obliga@on either to develop the features or func@onality described or to
include any such feature or func@onality in a future release.
Am I in the right Session… and Who is this guy?
3
Goal of Presenta:on: Search Op:miza:on • How the hell do I speed this search up?
Background of your Presenter: Julian Harty • Splunker for 2+ Years -‐ Variety of installa@ons from 10GB to 100TB’s+
• Ex-‐Oracle/MySQL DBA (Recovering) • Contact info [email protected]
Background – Great to Not So Great Growth without op@miza@on = subop@mal performance
-‐> our goal: gejng great performance at scale
4
• More Data • More Users • New Searches • Even More Data
• Even More Users • Even More Searches…
Op@miza@on Steps
Challenge – Why so slow? The maturity of a Splunk deployment
5
Question? Is your environment tuned correctly?
Question? Has your deployment been architected correctly?
Question? Are your searches optimized?
Solution: Architecting And Designing Your Splunk Deployment - Simeon Yep
Solution: Jiffy Lube Quick Tune Up For Your Splunk Environment – Sean Delaney
Solution: Welcome to this session!!!
Agenda: Objec@ves of this Session
6
• The Basics: • Common pinalls -‐ Best prac@ces and what not to do • Take away: Basic steps to a beoer search
• Beyond the Basics: • Search Architecture and Workflow • Detailed Search review – using Job inspector search examples • Take away: Job Inspector Cheat-‐Sheet
• Q&A
SOS – Expensive Searches Search Ac@vity, Usage Paoerns
-‐> SOS –> Search -‐> Search Detail Ac@vity -‐> Expensive Searches
9
The Basics: Common Search Behavior
12
> be=selec@ve AND be=specific | …
Narrow @me range
> foo bar
> host=web sourcetype=access*
Use Summary Indexing
Use Report Accel or Summary Indexing
Use Fast/Smart Mode where Possible
Bad Behavior Good Behavior
Performance Improvement
Comment
index=xyz 10-‐50% Index and default fields source=www
-‐24h@h 365x 30x Limit Time Range
> foo bar 30% Combine Searches
Fast/Smart 20-‐50% Fast Mode A AND C AND D AND E 5-‐50% Avoid NOTS
Data Models and Report Accelera@on
Summary Indexing
All Time Searches
>*
> foo | search bar
Verbose Mode
Use Intelligently
Use Sparingly
1000%
1000%
Searches over large datasets Searches over long periods
A NOT B
The Basics: Common Op@miza@on Mistakes
13
• Summary indexing is Awesome! – Ini@al reac@on -‐ Summarize EVERYTHING!!!
ê Summarizing too much data negates the point
• Report Accelerate = Turbo buoon – Ini@al reac@on -‐ Report Accelerate EVERYTHING!!!
ê Too many searches = skipped search issues
• Data Models are the answer! – Ini@al reac@on – everything can be included!
ê Convoluted data models can increase workload
How do you enforce Best Prac@ces?
15
Architect Perspec:ve: • User educa@on – Best Prac@ces for Users Admin Perspec:ve: Restric@ng User Controls: Pulling in the reins • Restric@ng Role Capabili@es
• Limit index • Limit search terms • Limit search @me range
• Limi@ng Power user role • Restrict Number of RT+ Concurrent Searches
How do you enforce Best Prac@ces?
16
Admin Perspec:ve: • Time range defaults (ui-‐prefs.conf) • Time range Web dropdown op@ons (Times.conf)
How Search Works – Physical Perspec@ve
18
db_lt_et_4
db_lt_et_2
db_lt_et_1
db_lt_et_3
.tsidx
Sources.data
SourceTypes.data
Hosts.data
.gz .gz
.gz .gz
.gz
.gz .gz
.gz
db_1290057665_1289504696_1 history
_internal
main
How Search Works -‐ Logically Search Query Structure
Parse, Fetch, Summarize, Display
19
Index=mydata | eval loc=long+lat+alt | stats count
retrieve events filter/transform/map
Splunk Distributed Search
20
4 Steps to a Splunk Search: Parse, Fetch, Summarize, Display
" StreamingCommand: Applies a transforma@on to search results as they travel through the processing pipeline. Eval rex where…
" Repor:ngCommand: Processes search results and generates a repor@ng data structure. Examples: stats, top, and @mechart…
Types of Searches
21
• Dense – Low cardinality – Example: sourcetype=access method=GET
• Sparse – High cardinality – Example: sourcetype=access method=GET ac@on=purchase
• Super Sparse (or Needle in a Haystack) – Very high cardinality – Example: sourcetype=cisco:asa ac@on=denied src=10.2.3.11
• Rare – Use Case: user behavior tracking – Example: sourcetype=magicsource | rare
Dense
Super Sparse
Sparse
Dense Searches (>10% matching results) (scanCount vs eventCount in Job Inspector)
22
Challenge: • CPU and I/O-‐bound
– Ini@al spike in CPU due to decompression of raw events.
– Retrieval rate: 50K events per second per server Solu:on: • Divide and conquer
– Distribute search to an indexing cluster – Parallel compute and merge results
• Report Accelera@on or use of Summaries – divide and Conquer – Report on summarized data vs. raw data
> sourcetype=access_combined method=GET
Sparse Searches
23
Challenge: • CPU-‐bound
– Dominant cost is uncompressing *.gz raw data files – Some@mes need to read far into a file to retrieve a few events
Solu:on: • Avoid cherry picking
– Be selec@ve about exclusions (avoid “NOT foo” or “field!=value”) – Leverage indexed fields
• Filter using whole terms – Instead of > sourcetype=access_combined clientip=192.168.11.*!– Use > sourcetype=access_combined clientip=TERM(192.168.11.2)!
> sourcetype=access_combined status=404
Super Sparse Searches
24
• “Needle in Haystack” • Very I/O intensive • May take up to 2 Seconds
to parse each bucket
> sourcetype=access_combined status=404 10.2.1
Rare Term Searches
25
• Bloom Filters* – Bloom filters stored in each bucket – 50-‐buckets processed per second – I/Os reduced as buckets are excluded from 100-‐200 to just a few – 50-‐100x faster than Super Sparse searches on conven@onal storage,
>1000x faster on SSD (Due to random reads)
> sourcetype=access_combined sessionID=1234
* A Bloom filter is a data structure designed to tell you whether or not an element is present in a set
How can I determine if my search is Dense or Sparse? Use Job Inspector…
26
Component Descrip:on
scanCount The number of events that are scanned or read off disk.
eventCount Number of events that are returned to base search
• For dense searches scanCount ~= eventCount. • For sparse searches, scanCount >> eventCount.
> sourcetype=access_combined status=404 81.11.191.113
Measuring Search Using the Splunk Search Inspector
28 Copyright*©*2011,*Splunk*Inc.* Listen*to*your*data.*
*
Using*the*Search*Inspector*
3*
Timings*from*distributed*peers*
Remote*timeline*
Timings*from*the*search*command.*
Timings from distributed peers
Timings from the search command
Copyright*©*2011,*Splunk*Inc.* Listen*to*your*data.*
*
Using*the*Search*Inspector*
3*
Timings*from*distributed*peers*
Remote*timeline*
Timings*from*the*search*command.*
Key Metrics: • Comple@on Time • Number of Events
Scanned • Search SID
Job Inspector
Job Inspector Walkthrough – Search Command
29
Rawdata: Improving I/O and CPU load KV: Are field extrac@ons efficient Lookups: Used appropriately Autolookups causing issues Typer: Inefficient Evenoypes Alias: Cascading alias
Reading Job Inspector -‐ Search.Index
30
Search.index = Time to parse and read the tsidx files to determine where to read in rawdata How do you op:mize this? • Improving I/O
Reading Job Inspector -‐ search.rawdata
31
Search.rawdata = Time to read actual events from rawdata files How do you op:mize this? • Filtering as much as possible • Add Peers • Alloca@ng more CPU, improving I/O
Reading Job Inspector -‐ search.kv
32
Search.KV= Time taken to apply field extrac@ons to events How do you op:mize this? Regex op@miza@ons • Avoid greedy operators .*? • Use of Anchors ^ $ • Non Capturing groups for repeats
Reading Job Inspector -‐ search.lookups
33
Search.lookups = Time to apply lookups to search How do you op:mize this? • Use Appropriately (at end of search) • Autolookups maybe causing issues
Reading Job Inspector -‐ search.typer and tags
34
Search.typer = Time to apply event types to the search How do you op:mize this? • Use Appropriately • Removed unused tags and
evenoypes
Job Inspector Walkthrough – Distributed Search
35
Dispatch.createProviderQueue Time to establish connec@on with peers Dispatch.fetch Time spent wai@ng to fetch events Dispatch.evaluate The @me spent parsing the search and sejng up the data structures needed to run the search. How do you op:mize this? • Improving Peer conduc@vity • Improve Bundle replica@on • Faster storage
Job Inspector Walkthrough – Distributed Search
36
Dispatch.stream.remote Time to retrieve events from each remove peer Issue: 1. Unequal Indexer performance
• Either Hardware mismatch • Uneven distribu@on of indexes
2. AutoLB issues
Job Inspector Conclusions: Search Command Summary
37
Component Descrip:on
index look in tsidx files for where to read in rawdata
rawdata read actual events from rawdata files
kv apply fields to the events
filter filter out events that don’t match (e.g., fields, phrases)
alias rename fields according to props.conf
lookups create new fields based on exis@ng field values
typer assign evenoypes to events
tags assign tags to events
Job Inspector Conclusion: Distributed Search Summary
38
Metric Descrip:on
Area to review
createProviderQueue
The @me to connect to all search peers. Peer conduc@vity
fetch The @me spent wai@ng for or fetching events from search peers. Faster Storage
stream.remote The @me spent execu@ng the remote search in a distributed search environment, aggregated across all peers.
evaluate The @me spent parsing the search and sejng up the data structures needed to run the search.
Possible bundle issues
Addi@onal Key Logfiles related to search
39
Search log: " Stored in $SPLUNK_HOME/var/run/splunk/dispatch/ " Detailed analysis of every step taken by the search
" Search ‘stack trace’
Stats vs Transac@on
41
Search Goal: compute sta@s@cs on the dura@on of web session (JSESSIONID=unique iden@fier):
> | stats range(_@me) as dura@on by JSESSIONID | chart count by dura@on span=log2
> sourcetype=access_combined | transac@on JSESSIONID | chart count by dura@on
span=log2
Not so Great:
Much BeUer:
Dedup vs Latest
42
Search Goal: Return latest cart ac@on for each web site customer
> sourcetype=access* | stats latest(clien@p) by ac@on
> sourcetype=access*| dedup clien@p sortby -‐_@me |table clien@p, ac@on
Not so Great:
Much BeUer:
Note: dedup can't be used with report accelera@on
Joins and Subsearches
43
Search Goal: Return latest JESSIONID across two sourcetypes
> (sourcetype="access_combined") OR (sourcetype="applogs") | stats latest(*) as * by
JSESSIONID
> sourcetype="access_combined" | join type="inner" JSESSIONID [search sourcetype="applogs" | dedup JSESSIONID | table JSESSIONID, clien@p, othervalue]
Not so Great:
Much BeUer:
In Closing…
45
1. Implemen@ng Architecture best prac@ces for performance at scale • With search behavior in mind…
2. Implemen@ng User Onboarding Best Prac@ces • Basic op@miza@on steps
3. Periodic Performance Review • Applying accelera@on technologies where appropriate • Removing unused searches
4. Review addi@onal sides for
• Search flow detail • Op@mizing Splunk Web
And By the way…
46
Other Sessions to look out for: • How to Actually Use Splunk Data Models -‐ David Clawson
Presented on Tuesday – Check out the session notes
• Jiffy Lube Tune-‐Up for your Splunk Deployment -‐ Sean Delaney Presented on Tuesday – Check out the session notes
• ArchitecCng and Sizing your Splunk Environment -‐ Simeon Yep
2:15-‐3:15 Today
• Splunk Search AcceleraCon Technologies – Gerald Kanapathy 10:30-‐11:30 Tomorrow
My Contact informa:on:
[email protected] @julian_Harty
Take Away: Basic Steps to a beoer search
48
• Avoid use of * where ever possible.
• Avoid the use of All Time.
• Avoid subsearches searches.
• Incorporate the use default fields (source, sourcetype, host) as well as specific indexes to every search (where possible).
• Use Fast or Smart mode where possible avoid ‘Verbose’ mode.
• Use Report Accelera@on Sparingly (and Strategically) on reports on large datasets.
• Use Summary Indexing when building reports over @me spans beyond target index reten@on.
• Use Job Inspector and Search inspector to get more info (hold on for more details!!!)
A few notes on how to op@mize Splunk Web
49
| fields
Change Segmenta@on
Use Fast Mode Collapse Timeline
Top Related