Splunk-4.1.7-Knowledge

117
Splunk Knowledge Manager Manual Version: 4.1.7 Generated: 2/16/2011 03:57 pm Copyright Splunk, Inc. All Rights Reserved

Transcript of Splunk-4.1.7-Knowledge

Page 1: Splunk-4.1.7-Knowledge

Splunk Knowledge Manager Manual

Version: 4.1.7

Generated: 2/16/2011 03:57 pmCopyright Splunk, Inc. All Rights Reserved

Page 2: Splunk-4.1.7-Knowledge

Table of ContentsWelcome to knowledge management................................................................................................1

What is Splunk knowledge?.......................................................................................................1 Why manage Splunk knowledge?..............................................................................................2 Prerequisites for knowledge management.................................................................................4

Organize and administrate knowledge objects.................................................................................6 Curate Splunk knowledge with Manager....................................................................................6 Develop naming conventions for knowledge objects...............................................................13 Understand and use the Common Information Model..............................................................14

Data interpretation: Fields and field extractions.............................................................................25 About fields..............................................................................................................................25 Overview of search-time field extraction..................................................................................26 Use the Field extractions page in Manager..............................................................................29 Use the Field transformations page in Manager......................................................................34 Create and maintain search-time field extractions through configuration files.........................39 Configure multivalue fields.......................................................................................................52

Data classification: Event types and transactions.........................................................................54 About event types....................................................................................................................54 Define and maintain event types in Splunk Web......................................................................56 Configure event types directly in eventtypes.conf....................................................................60 Configure event type templates...............................................................................................61 About transactions...................................................................................................................62 Search for transactions............................................................................................................63 Define transactions..................................................................................................................65

Data enrichment: Lookups and workflow actions..........................................................................68 About lookups and workflow actions........................................................................................68 Look up fields from external data sources................................................................................69 Create workflow actions in Splunk Web...................................................................................77 Configure workflow actions through workflow_actions.conf.....................................................85

Data normalization: Tags and aliases..............................................................................................86 About tags and aliases.............................................................................................................86 Define and manage tags..........................................................................................................86 Create aliases for fields............................................................................................................90 Tag the host field......................................................................................................................92 Tag event types........................................................................................................................93

Manage your search knowledge.......................................................................................................94 Manage saved searches..........................................................................................................94 Configure the priority of scheduled searches...........................................................................94 Design macro searches...........................................................................................................96 Design form searches..............................................................................................................96

i

Page 3: Splunk-4.1.7-Knowledge

Table of ContentsManage your search knowledge

Define navigation to saved searches and reports....................................................................97

Set up and use summary indexes....................................................................................................99 Use summary indexing for increased reporting efficiency........................................................99 Manage summary index gaps and overlaps...........................................................................105 Configure summary indexes...................................................................................................109

ii

Page 4: Splunk-4.1.7-Knowledge

Welcome to knowledge management

What is Splunk knowledge?What is Splunk knowledge?

Splunk is a powerful search and analysis engine that helps you see both the details and the largerpatterns in your IT data. When you use Splunk you do more than just look at individual entries in yourlog files; you leverage the information they hold collectively to find out more about your ITenvironment.

Splunk automatically extracts different kinds of knowledge from your IT data--events, fields,timestamps, and so on--to help you harness that information in a better, smarter, more focused way.Some of this information is extracted at index time, as Splunk indexes your IT data. But the bulk ofthis information is created at "search time," both by Splunk and its users. Unlike databases orschema-based analytical tools that decide what information to pull out or analyze beforehand, Splunkenables you to dynamically extract knowledge from raw data as you need it.

As your organization uses Splunk, additional categories of Splunk knowledge objects are created,including event types, tags, lookups, field extractions, workflow actions, and saved searches.

You can think of Splunk knowledge as a multitool that you use to discover and analyze variousaspects of your IT data. For example, event types enable you to quickly and easily classify and grouptogether similar events; you can then use them to perform analytical searches on precisely-definedsubgroups of events.

If you've read the User manual, you know that it covers Splunk knowledge basics in its "Captureknowledge" chapter. The Knowledge Manager manual goes into more depth. It shows you how tomaintain sets of knowledge objects for your organization (through Manager and configuration files)and demonstrates ways that Splunk knowledge can be used to solve your organization's real-worldproblems.

Splunk knowledge is grouped into five categories:

Data interpretation: Fields and field extractions - Fields and field extractions make up thefirst order of Splunk knowledge. The fields that Splunk automatically extracts from your IT datahelp bring meaning to your raw data, clarifying what can at first glance seemincomprehensible. The fields that you extract manually expand and improve upon this layer ofmeaning.

Data classification: Event types and transactions - You use event types and transactionsto group together interesting sets of similar events. Event types group together sets of eventsdiscovered through searches, while transactions are collections of conceptually-related eventsthat span time.

Data enrichment: Lookups and workflow actions - Lookups and workflow actions arecategories of knowledge objects that extend the usefulness of your data in various ways. Fieldlookups enable you to add fields to your data from external data sources such as static tables(CSV files) or Python-based commands. Workflow actions enable interactions between fieldsin your data and other applications or web resources, such as a WHOIS lookup on a field

1

Page 5: Splunk-4.1.7-Knowledge

containing an IP address.Data normalization: Tags and aliases - Tags and aliases are used to manage and normalizesets of field information. You can use tags and aliases to group sets of related field valuestogether, and to give extracted fields tags that reflect different aspects of their identity. Forexample, you can group events from set of hosts in a particular location (such as a building orcity) together--just give each host the same tag. Or maybe you have two different sourcesusing different field names to refer to same data--you can normalize your data by using aliases(by aliasing clientip to ipaddress, for example).

Saved searches - Saved searches are another category of Splunk knowledge. Vast numbersof saved searches can be created by Splunk users within an organization, and thoughtfulsaved search organization ensures that they are discoverable by those that need them. Thereare also advanced uses for saved searches: they are often used in dashboards, can be turnedinto reusable search macros, and more.

The Knowledge Manager manual also includes a chapter on summary indexing. Summary indexsetup and oversight is an advanced practice that can benefit from being handled by users in aknowledge management role.

At this point you may be asking the question "Why does Splunk knowledge need to be 'managed'anyway?" For answers, see "Why manage Splunk knowledge?", the next topic in this chapter.

Knowledge managers should have at least a basic understanding of data input setup, eventprocessing, and indexing concepts. For more information, see Prerequisites for knowledgemanagement, the third topic in this chapter.

Make a PDF

If you'd like a PDF of any version of this manual, click the pdf version link above the table ofcontents bar on the left side of this page. A PDF version of the manual is generated on the fly for you,and you can save it or print it out to read later.

Why manage Splunk knowledge?Why manage Splunk knowledge?

If you have to maintain a fairly large number of knowledge objects across your Splunk deployment,you know that management of that knowledge is important. This is especially true of organizationsthat have a large number of Splunk users, and even more so if you have several teams of usersworking with Splunk. This is simply because a greater proliferation of users leads to a greaterproliferation of additional Splunk knowledge.

When you leave a situation like this unchecked, your users may find themselves sorting through largesets of objects with misleading or conflicting names, struggling to find and use objects that haveunevenly applied app assignments and permissions, and wasting precious time creating objects suchas saved searches and field extractions that already exist elsewhere in the system.

Splunk managers provide centralized oversight of the Splunk knowledge. The benefits thatknowledge managers can provide include:

2

Page 6: Splunk-4.1.7-Knowledge

Oversight of knowledge object creation and usage across teams, departments, anddeployments. If you have a large Splunk deployment spread across several teams of users,you'll eventually find teams "reinventing the wheel" by designing objects that were alreadydeveloped by other teams. Knowledge managers can mitigate these situations by monitoringobject creation and ensuring that useful "general purpose" objects are shared on a globalbasis across deployments.

For more information, see "Curate Splunk knowledge with Manager" in this manual.

Normalization of event data. To put it plainly: knowledge objects proliferate. Although Splunkis based on data indexes, not databases, the basic principles of normalization still apply. It'seasy for any robust, well-used Splunk implementation to end up with a dozen tags that all havebeen to the same field, but as these redundant knowledge objects stack up, the end result isconfusion and inefficiency on the part of its users. We'll provide you with some tips aboutnormalizing your knowledge object libraries by applying uniform naming standards and usingSplunk's Common Information Model.

For more information, see "Develop naming conventions for knowledge objects" in thismanual.

Management of knowledge objects through configuration files. True knowledgemanagement experts know how and when to leverage the power of Splunk's configuration fileswhen it comes to the administration of Splunk knowledge. There are certain aspects ofknowledge object setup that are best handled through configuration files. This manual willshow you how to work with knowledge objects this way.

See "Create search time field extractions" in this manual as an example of how you canmanage Splunk knowledge through configuration files.

Setup and organization of app-level navigation for saved searches and reports, as wellas views and dashboards. Left unmoderated, the navigation for saved searches, reports,views, and dashboards can become very confusing as more and more of these kinds ofobjects are added to Splunk applications. You don't have to be a Splunk app designer toensure that users can quickly and easily navigate to the searches, reports, views, anddashboards they need to do their job efficiently.

For more information, see "Define navigation for saved searches and reports" in this manual.

Review of summary index setup and usage. Summary indexes may be used by manyteams across your deployment to run efficient searches on large volumes of data, but theirusage also counts against your overall license volume. The knowledge manager can providecentralized oversight of summary index usage across your organization, ensuring that they arebuilt correctly, used responsibly, and are shared as appropriate with users throughout yourSplunk deployment.

Note: As of Release 4.1, summary index usage does not count against your overall licensevolume.

3

Page 7: Splunk-4.1.7-Knowledge

For more information, see "Use summary indexing for increased reporting efficiency" in thismanual.

Prerequisites for knowledge managementPrerequisites for knowledge management

Most knowledge management tasks are centered around "search time" event manipulation. In otherwords, a typical knowledge manager usually doesn't focus their attention on work that takes placebefore events are indexed, such as setting up data inputs, adjusting event processing activities,correcting default field extraction issues, creating and maintaining indexes, setting up forwarding andreceiving, and so on.

However, we do recommend that all knowledge managers have a good understanding of these"Splunk admin" concepts. A solid grounding in these subjects enables knowledge managers to betterplan out their approach towards management of knowledge objects for their deployment...and it helpsthem troubleshoot issues that will inevitably come up over time.

Here are some of the "admin" topics that knowledge managers should be familiar with, with Adminmanual links to get you started:

Working with Splunk apps: If your deployment uses more than one Splunk app, you shouldget some background on how they're organized and how app object management works withinmulti-app deployments. See "What's an app?", "App architecture and object ownership", and"Manage app objects".

Configuration file management: Where are Splunk's configuration files? How are theyorganized? How do configuration files take precedence over each other? See "Aboutconfiguration files" and "Configuration file precedence".

Indexing with Splunk: What is an index and how does it work? What is the differencebetween "index time" and "search time" and why is this distinction significant? Start with"What's a Splunk index?" and read the rest of the chapter. Pay special attention to "Index timevs search time".

Getting event data into Splunk: It's important to have at least a baseline understanding ofSplunk data inputs. Check out "What Splunk can monitor" and read the other topics in thischapter as necessary.

Understand your forwarding and receiving setup: If your Splunk deployment utilizesforwarders and receivers, it's a good idea to get a handle on how they've been implemented,as this can affect your knowledge management strategy. Get an overview of the subject at"About forwarding and receiving".

Understand event processing: It's a good idea to get a good grounding in the steps thatSplunk goes through to "parse" data before it indexes it. This knowledge can help youtroubleshoot problems with your event data and recognize "index time" event processingissues. Start with "Overview of event processing" and read the entire chapter.

4

Page 8: Splunk-4.1.7-Knowledge

Default field extraction: Most field extraction takes place at search time, with the exception ofcertain default fields, which get extracted at index-time. As a knowledge manager, most of thetime you'll concern yourself with search-time field extraction, but it's a good idea to know howdefault field extraction can be managed when it's absolutely necessary to do so. This can helpyou troubleshoot issues with the host, source, and sourcetype fields that Splunk appliesto each event. Start with "About default fields".

Managing users and roles: Knowledge managers typically do not directly set up users androles. However, it's a good idea to understand how they're set up within your deployment, asthis directly affects your efforts to share and promote knowledge objects between groups ofusers. For more information, start with "About users and roles" and read the rest of the chapteras necessary.

5

Page 9: Splunk-4.1.7-Knowledge

Organize and administrate knowledge objects

Curate Splunk knowledge with ManagerCurate Splunk knowledge with Manager

As your organization uses Splunk, knowledge is added to the base set of event data indexed withinit. Searches are saved and scheduled. Tags are added to fields. Event types and transactions thatgroup together sets of events are defined. Lookups and workflow actions are engineered.

The process of knowledge object creation starts out slow, but can get complicated over time. It'seasy to reach a point where users are "reinventing the wheel," creating searches that already exist,designing redundant event types, and so on. These things may not be a big issue if your user base issmall, but they can cause unnecessary confusion and repetition of effort, especially as theyaccumulate over time.

This topic discusses how knowledge managers can use Splunk Manager to take charge of theknowledge objects in their Splunk system and show them who's boss. Splunk Manager can give asavvy and attentive knowledge manager a view into what knowledge objects are being created, whothey're being created by, and (to some degree) how they are being used.

With Manager, you can easily:

Create knowledge objects as necessary, either "from scratch" or through object cloning.• Review knowledge objects as they are created, with an eye towards reducing redundancy,ensuring that naming standards are followed, and that "bad" objects are removed before theydevelop lots of downstream dependencies.

Ensure that knowledge objects with relevancy beyond a particular working team, role, or appare made available to other teams, roles, and users of other apps.

Delete knowledge objects that do not have significant "downstream" dependencies.•

Note: This topic assumes that as a knowledge manager you have an admin role or a role with anequivalent permission set.

Using configuration files instead of Manager

In previous releases Splunk users edited Splunk's configuration files directly to add, update, or deleteknowledge objects. Now they can use Manager, which provides a user-friendly interface with thosevery same configuration files.

We do recommend having some familiarity with configuration files. The reasons for this include:

Some Manager functionality makes more sense if you understand how things work at theconfiguration file level. This is especially true for the Field extractions and Fieldtransformations pages in Manager.

Functionality exists for certain knowledge object types that isn't (or isn't yet) expressed in theManager UI.

6

Page 10: Splunk-4.1.7-Knowledge

Bulk deletion of obsolete, redundant, or improperly defined knowledge objects is only possiblewith configuration files.

You may find that you prefer to work directly with configuration files. For example, if you're along-time Splunk user, brought up on our configuration file system, it may be the medium inwhich you've grown accustomed to dealing with knowledge objects. Other users just prefer thelevel of granularity and control that configuration files can provide.

Wherever you stand with Splunk's configuration files, we want to make sure you can use them whenyou find it necessary to do so. To that end, you'll find that the Knowledge Manager manual includesinstructions for handling various knowledge object types via configuration files. For more information,see the documentation of those types.

For general information about configuration files in Splunk, see the following topics in the Adminmanual:

About configuration files• Configuration file precedence•

You can find examples of the current configuration .spec and .example files in the "Configurationfile reference" chapter of the Admin manual.

Monitor and organize knowledge objects

As a knowledge manager, you should periodically check up on the knowledge object collections inyour Splunk implementation. You should be on the lookout for knowledge objects that:

Fail to adhere to naming standards• Are duplicates/redundant• Are worthy of being shared with wider audiences• Should be disabled or deleted due to obsolescence or poor design•

Regular inspection of the knowledge objects in your system will help you detect anomalies that couldbecome problems later on.

Example - Keeping tags straight

Most healthy Splunk implementations end up with a lot of tags, which are used to perform searcheson clusters of field/value pairings. Over time, however, it's easy to end up with tags that have similarnames but which produce surprisingly dissimilar results. This can lead to considerable confusion andfrustration.

Here's a procedure you can follow for curating tags. It can easily be adapted for other types ofknowledge objects handled through Manager.

1. Go to Manager > Tags > List by tag name.

2. Look for tags with similar or duplicate names that belong to the same app (or which have beenpromoted to global availability for all users). For example, you might find a set of tags like

7

Page 11: Splunk-4.1.7-Knowledge

authentication and authentications in the same app, where one tag is linked to an entirelydifferent set of field/value pairs than the other.

Alternatively, you may encounter tags with identical names except for the use of capital letters, as incrash and Crash. Tags are case-sensitive, so Splunk sees them as two separate knowledgeobjects.

Keep in mind that you may find legitimate tag duplications if you have the App context set to All,where tags belonging to different apps have the same name. This is often permissible--after all, anauthentication tag for the Windows app will have to be associated with an entirely different set offield/value pairs than an authentication for the UNIX app, for example.

3. Try to disable or delete the duplicate or obsolete tags you find, if your permissions enable you todo so. However, be aware that there may be objects dependent on it that will be affected. If thetag is used in saved searches, dashboard searches, other event types, or transactions, those objectswill cease to function once the tag is removed or disabled. This can also happen if the object belongsto one app context, and you attempt to move it to another app context.

For more information, see "Disable or delete knowledge objects," below.

4. If you create a replacement tag with a new, more unique name, ensure that it is connected to thesame field/value pairs as the tag that you are replacing.

Using naming conventions to head off object nomenclature issues

If you set up naming conventions for your knowledge objects early in your implementation of Splunkyou can avoid some of the thornier object naming issues. For more information, see "Develop namingconventions for knowledge objects" in this manual.

Share and promote knowledge objects

As a Knowledge Manager, you can set knowledge object permissions to restrict or expand access tothe variety of knowledge objects within your Splunk implementation.

In some cases you'll determine that certain specialized knowledge objects should only be used bypeople in a particular role, within a specific app. And in others you'll move to the other side of thescale and make universally useful knowledge objects globally available to all users in all apps. Aswith all aspects of knowledge management you'll want to carefully consider the implications of theseaccess restrictions and expansions.

When a Splunk user first creates a new saved search, event type, transaction, or similar knowledgeobject, it is only available to that user. To make that object available to more people, Managerprovides the following options, which you can take advantage of if your permissions enable you to doso. You can:

Make the knowledge object available globally to users of all apps (also referred to as"promoting" an object).

Make the knowledge object available to all users of an app.•

8

Page 12: Splunk-4.1.7-Knowledge

Restrict (or expand) access to global or app-specific objects by user or role.• Set read/write permissions at the app level for roles, to enable users to share or delete objectsthey do not own.

How do permissions affect knowledge object usage?

To illustrate how these choices can affect usage of a knowledge object, imagine that Bob, a user ofthe (fictional) Network Security app with a "Firewall Manager" role, creates a new event type namedfirewallbreach, which finds events that indicate firewall breaches. Here's a series ofpermissions-related issues that could come up, and the actions and results that would follow:

Issue Action Result

When Bob first createsfirewallbreach, it is onlyavailable to him. Otherusers cannot see it or workwith it. Bob decides hewants to share it with hisfellow Network Security appusers.

Bob updates the permissionsof the firewallbreachevent type so that it isavailable to all users of theNetwork Security app,regardless of role. He alsosets up the new event type sothat all Network Security userscan edit its definition.

Anyone using Splunk in the NetworkSecurity app context can see, workwith, and edit the firewallbreachevent type. Users of other Splunkapps in the same Splunkimplementation have no idea it exists.

A bit later on, Mary, theknowledge manager,realizes that only users inthe Firewall Manager roleshould have the ability toedit or update thefirewallbreach eventtype.

Mary restricts the ability to editthe event type to the FirewallManager role.

Users of the Network Security app canuse the firewallbreach event typein transactions, searches,dashboards, and so on, but now theonly people that can edit theknowledge object are those with theFirewall Manager role and people withadmin level permissions (such as theknowledge manager). People usingSplunk in other app contexts remainblissfully ignorant of the event type.

At some point a few peoplewho have grown used tousing the very handyfirewallbreach eventtype in the Network Securityapp decide they'd like to useit in the context of theWindows app as well.

They make their case to theknowledge manager, whopromptly promotes thefirewallbreach event typeto global availability.

Now, everyone that uses thisimplementation of Splunk can use thefirewallbreach event type, nomatter what app context they happento be in. But the ability to update theevent type definition is still confined toadmin-level users and users with theFirewall Manager role.

Note: You may want to set your Splunk implementation up so that only people with Admin-level rolescan share and promote knowledge objects. This would make you (and your fellow knowledgemanagers) gatekeepers with approval capability over the sharing of new knowledge objects.

Permissions - Getting started

To change the permissions for a knowledge object, follow these steps:

9

Page 13: Splunk-4.1.7-Knowledge

1. In Manager, navigate to the page for the type of knowledge object that you want to updatepermissions for (such as Searches and reports or Event types).

2. Find the knowledge object that you created (use the filtering fields at the top of the page ifnecessary) and click its Permissions link.

3. On the Permissions page for the knowledge object in question, perform the actions in the followingsubsections depending on how you'd like to change the object's permissions.

Make an object available to users of all apps

To make an object globally available to users of all apps in your Splunk implementation:

1. Navigate to the Permissions page for the knowledge object (following the instructions above).

2. Under [Knowledge object type] should appear in:, select All apps.

3. In the Permissions section, for Everyone, select a permission of either Read or Write:

Read enables users to see and use the object, but not update its definition. In other words,when users only have Read permission for a particular saved search, they can see it in the toplevel navigation (the "Searches & Reports" dropdown, for example) and they can run it. Butthey can't update the the search string, change its time range, and save their changes.

Write enables users to view, use, and update the defining details of an object as necessary.• If neither Read or Write is selected then users cannot see or use the knowledge object.•

4. Save the permission change.

Make an object available to users of a particular app

To restrict the usage of a knowledge object to a specific app, you first have to be in the context of thatapp. To do this, click the App dropdown in the upper right-hand corner of the screen and select theapp to which you'd like to restrict the knowledge object.

If the knowledge object is private, or shared globally, then all you have to do is navigate tothe Permissions page for that object and select This app under [Knowledge object type]should appear in:. Then select a permission of either Read or Write for Everyone asappropriate.

If usage of a knowledge object is already restricted to an app and you want to switch itscontext to another app, click the Move link (it will only appear if you have sufficientpermissions to move it). This will enable you to quickly and easily choose another app contextfor the knowledge object.

Keep in mind, however, that switching the app context of an knowledge object can havedownstream consequences for objects that have been associated with it. For more informationsee "Disable or delete knowledge objects", below.

10

Page 14: Splunk-4.1.7-Knowledge

Restrict knowledge object access by role

You can use this method to lock down various knowledge objects from alteration by specific roles.You can arrange things so users in a particular role can use the knowledge object but not updateit--or you can set it up so those users cannot see the object at all. In the latter case, the object will notshow up for them in Manager, and they will not find any results when they search on it.

If you want restrict the ability to see or update a knowledge object by role, simply navigate to thePermissions page for the object. If you want members of a role to:

Be able to use the object and update its definition, give that role Read and Write access.• Be able to use the object but be unable to update it, give that role Read access only (andmake sure that Write is unchecked for the Everyone role).

Be unable to see or use the knowledge object at all, leave Read and Write unchecked forthat role (and unchecked for the Everyone role as well).

For more information about role-based permissions in Splunk see "About users and roles" in theAdmin manual.

A note about deleting users and roles with unshared objects

If a Splunk user leaves your team and you need to delete that user or role from the Splunk system,be aware that you will lose any knowledge objects belonging to them that have a sharing status ofprivate. If you want to keep those knowledge objects, share them at the app or global level beforedeleting the user or role.

Disable or delete knowledge objects

Let's start off by saying that Manager makes it fairly easy to disable or delete knowledge objects aslong as your permissions enable you to do so. In Splunk, the ability to delete knowledge objects inManager really depends on a set of factors:

You cannot delete default knowledge objects that were delivered with Splunk (or withthe App) via Manager. If the knowledge object definition resides in the app's default directory,it can't be removed via Manager. It can only be disabled (by clicking Disable). Only objects

11

Page 15: Splunk-4.1.7-Knowledge

that exist in an app's "local" directory are eligible for deletion.You can delete knowledge objects that you have created, and which haven't beenshared. Once a knowledge object you've created is shared with other users, your ability todelete it is revoked, unless you have write permissions for the app to which they belong (seethe next point).

To delete all other knowledge objects, you need to have write permissions for theapplication to which they belong. This applies to knowledge objects that are shared globallyas well as those that are only shared within an app--all knowledge objects belong to a specificapp, no matter how they are shared.

App-level write permissions are usually only granted to users with admin-equivalent roles.

To sum up: the ability to edit a knowledge object has nothing to do with the ability to delete it. If youcan't delete a particular knowledge object you may still be able to disable it, which essentially has thesame function as knowledge object deletion without removing it from the system.

Deleting knowledge objects with downstream dependencies

You have to be careful about deleting knowledge objects with downstream dependencies, as this canhave negative impacts.

For example, you could have a tag that looks like the duplicate of another, far more common tag. Onthe surface it would seem to be harmless to delete the dup tag. But what you may not realize is thatthis duplicate tag also happens to be part of a search that a very popular event type is based upon.And that popular event type is used in two important saved searches--the first is the basis for awell-used dashboard panel, and the other is used to populate a summary index that is used bysearches that run several other dashboard panels. So if you delete that tag, the event type breaks,and everything downstream of that event type breaks.

This is why it is important to nip poorly named or defined knowledge objects in the bud,before they become inadvertently hard-wired into the workings of your deployment. The onlyway to identify the downstream dependencies of a particular knowledge object is to search on it, findout where it is used, and then search on those things to see where they are used--it can take a bit ofdetective work. There is no "one click" way to bring up a list of knowledge object downstreamdependencies at this point.

If you really feel that you have to delete an knowledge object, and you're not sure if you've trackeddown and fixed all of its downstream dependencies, you could try disabling it first to see what impactthat has. If nothing seems to go seriously awry after a day or so, delete it.

Deleting knowledge objects in configuration files

Note that when you use manager, you can only disable or delete one knowledge object at a time. Ifyou need to remove large numbers of objects, the most efficient way to do it is by removing theknowledge object stanzas directly through the configuration files. Keep in mind that several versionsof a particular configuration file can exist within your system. In most cases you should only edit theconfiguration files in $SPLUNK_HOME/etc/system/local/, to make local changes on a site-widebasis, or $SPLUNK_HOME/etc/apps/<App_name>/local/, if you need to make changes thatapply only to a specific app.

12

Page 16: Splunk-4.1.7-Knowledge

Do not try to edit configuration files until you have read and understood the following topics in theAdmin manual:

About configuration files• Configuration file precedence•

Develop naming conventions for knowledge objectsDevelop naming conventions for knowledge objects

We suggest you develop naming conventions for your knowledge objects when it makes sense to doso. If the naming conventions you develop are followed consistently by all of the Splunk users in yourorganization, you'll find that they become easier to use and that their purpose is much easier todiscern at a glance.

You can develop naming conventions for just about every kind of knowledge object in Splunk.Naming conventions can help with object organization, but they can also help users differentiatebetween groups of saved searches, event types, and tags that have similar uses. And they can helpidentify a variety of things about the object that may not even be in the object definition, such as whatteams or locations use the object, what technology it involves, and what it's designed to do.

Early development of naming conventions for your Splunk implementation will help you avoidconfusion and chaos later on down the road.

Use the Common Information Model

Splunk's Common Information Model provides strategies for normalizing your approach to extractedfield names, event type tagging, and host tagging. It includes:

A list of standard custom fields• An event type tagging system• Lists of standard host tags•

For more information, see "Understand and use the Common Information Model" in this manual.

Example - Set up a naming convention for saved searches

You work in the systems engineering group of your company, and as the knowledge manager foryour Splunk implementation, it's up to you to come up with a naming convention for the savedsearches produced by your team.

In the end you develop a naming convention that pulls together:

Group: Corresponds to the working group(s) of the user saving the search.• Search type: Indicates the type of search (alert, report, summary-index-populating)• Platform: Corresponds to the platform subjected to the search• Category: Corresponds to the concern areas for the prevailing platforms.• Time interval: The interval over which the search runs (or on which the search runs, if it is ascheduled search).

13

Page 17: Splunk-4.1.7-Knowledge

Description: A meaningful description of the context and intent of the search, limited to one ortwo words if possible. Ensures the search name is unique.

Group Searchtype Platform Category Time

interval Description

SEGNEGOPSNOC

AlertReportSummary

WindowsiSeriesNetwork

DiskExchangeSQLEvent logCPUJobsSubsystemsServicesSecurity

<arbitrary> <arbitrary>

Possible saved searches using this naming convention:

SEG_Alert_Windows_Eventlog_15m_Failures• SEG_Report_iSeries_Jobs_12hr_Failed_Batch• NOC_Summary_Network_Security_24hr_Top_src_ip•

Understand and use the Common Information ModelUnderstand and use the Common Information Model

The Common Information Model is based on the idea that you can break down most log files intothree components:

fields• event type tags• host tags•

With these three components a savvy knowledge manager should be able to set up their log files in away that makes them easily processable by Splunk and which normalizes noncompliant log files andforces them to follow a similar schema. The Common Information model details the standard fields,event type tags, and host tags that Splunk uses when it processes most IT data.

Normalizing the standard event format

This is the recommended format that should be used when events are generated or written to asystem:

<timestamp> name="<name>" event_id=<event_id> <key>=<value>

Any number of field key-value pairs are allowed. For example:

14

Page 18: Splunk-4.1.7-Knowledge

2008-11-06 22:29:04 name="Failed Login" event_id=sshd:failuresrc_ip=10.2.3.4 src_port=12355 dest_ip=192.168.1.35 dest_port=22

The keys are ones that are listed in the "Standard fields below". name and event_id are mandatory.

When events coming from a CISCO PIX log are compliant with the Common Information Modelformat, the following PIX event:

Sep 2 15:14:11 10.235.224.193 local4:warn|warning fw07 %PIX-4-106023:Deny icmp src internet:213.208.19.33 dsteservices-test-ses-public:193.8.50.70 (type 8, code 0) byaccess-group "internet_access_in"

looks as follows:

2009-09-02 15:14:11 name="Deny icmp" event_id=106023 vendor=CISCOproduct=PIX log_level=4 dvc_ip=10.235.224.193 dvs_host=fw07syslog_facility=local4 syslog_priority=warn src_ip=213.208.19.33dest_ip=193.8.50.70 src_network=internetdest_network=eservices-test-ses-public icmp_type=8 icmp_code=0proto=icmp rule_number="internet_access_in"

Standard fields

This table presents a list of standard fields that can be extracted from event data as customsearch-time field extractions.

Please note that we strongly recommend that all of these field extractions be performed at searchtime. There is no need to add these fields to the set of default fields that Splunk extracts at indextime.

For more information about the index time/search time distinction, see "Index time versus searchtime" in the Admin manual. For more information about performing field extractions at search time,see "Create search-time field extractions" in this manual.

field name data type Explanation

action string The action specified by the event. For example, access,execution, or modification.

affected_user stringThe user that was affected by a change. For example, userfflanda changed the name of user rhallen, rhallen is theaffected_user.

affected_user_group string The user group that is affected by a change.

affected_user_group_id string The identifier of the group affected by a change.

affected_user_id number The identifier of the user affected by a change.

15

Page 19: Splunk-4.1.7-Knowledge

affected_user_privileges enumeration The privileges of the user affected by a change.

app string ISO layer 7 (application layer) protocol--for example HTTP,HTTPS, SSH, IMAP.

bytes_in number How many bytes this device/interface received.

bytes_out number How many bytes this device/interface transmitted.

channel string 802.11 channel number used by a wireless network.

category string A device-specific classification provided as part of the event.

count number The number of times the record has been seen.

cve string The Common Vulnerabilities and Exposures (CVE)reference value.

desc string The free-form description of a particular event

dest_app string The name of the application being targeted.

dest_cnc_channel string The destination command and control service channel.

dest_cnc_name string The destination command and control service name.

dest_cnc_port number The destination command and control service port.

dest_country string The country associated with a packet's recipient.

dest_domain string The DNS domain that is being queried.

dest_host string The fully qualified host name of a packet's recipient. ForHTTP sessions, this is the host header.

dest_int string The interface that is listening remotely or receiving packetslocally.

dest_ip ipv4 address The IPv4 address of a packet's recipient.

dest_ipv6 ipv6 address The IPv6 address of a packet's recipient.

dest_lat number The (physical) latitude of a packet's destination.

dest_long number The (physical) longitude of a packet's destination.

dest_mac macaddress

The destination TCP/IP layer 2 Media Access Control(MAC) address of a packet's destination.

dest_nt_domain string The Windows NT domain containing a packet's destination.

dest_nt_host string The Windows NT host name of a packet's destination.

dest_port port The TCP/IP port to which a packet is being sent.

dest_record string The remote DNS resource record being acted upon.

dest_translated_ip ipv4 address The NATed IP address to which a packet is being sent.

dest_translated_port number The NATed port to which a packet is being sent.

dest_zone string The DNS zone that is being received by a slave as part of azone transfer.

dhcp_pool string The name of a given DHCP pool on a DHCP server

16

Page 20: Splunk-4.1.7-Knowledge

direction string The direction the packet is traveling, such as inbound oroutbound.

duration number The amount of time the event lasted.

dvc_host string The fully qualified domain name of the device transmitting orrecording the log record.

dvc_ip ipv4 address The IPv4 address of the device reporting the event.

dvc_ip6 ipv6 address The IPv6 address of the device reporting the event.

dvc_location string The free-form description of the device's physical location.

dvc_mac MACaddress

The MAC (layer 2) address of the device reporting theevent.

dvc_nt_domain string The Windows NT domain of the device recording ortransmitting the event.

dvc_nt_host string The Windows NT host name of the device recording ortransmitting the event.

dvc_time timestamp Time at which the device recorded the event.

end_time timestamp The event's specified end time.

event_id number A unique identifier that identifies the event. This is unique tothe reporting device.

file_access_time timestamp The time the file (the object of the event) was accessed.

file_create_time timestamp The time the file (the object of the event) was created.

file_hash string A cryptographic identifier assigned to the file object affectedby the event.

file_modify_time timestamp The time the file (the object of the event) was altered.

file_name string The name of the file that is the object of the event, with notinformation related to local file or directory structure.

file_path string The location of the file that is the object of the event, interms of local file and directory structure.

file_permission string Access controls associated with the file affected by theevent.

file_size number The size of the file that is the object of the event. Indicatewhether Bytes, KB, MB, GB.

http_content_type string The HTTP content type.

http_method string The HTTP method used in the event.

http_referrer string The HTTP referrer listed in the event.

http_response number The HTTP response code.

http_user_agent string The HTTP user agent.

ip_version number The numbered Internet Protocol version - 4 or 6.

17

Page 21: Splunk-4.1.7-Knowledge

length number The length of the datagram, event, message, or packet.

log_level string The log-level that was set on the device and recorded in theevent.

name stringThe name of the event as reported by the device. The nameshould not contain information that's already being parsedinto other fields from the event, such as IP addresses.

object_name string The object name (associated mainly with Windows).

object_type string The object type (associated mainly with Windows).

object_handle string The object handle (associated mainly with Windows).

outbound_interface string The network interface through which a packet wastransmitted.

packets_in number How many packets this device/interface received.

packets_out number How many packets this device/interface transmitted.

pid number An integer assigned by the device operating system to theprocess creating the record.

priority number

An environment-specific assessment of the importance ofthe event, based on elements such as event severity,business function of the affected system, or other locallydefined variables.

process string The program that generated this record (such as a processname mentioned in the syslog header).

product string The product that generated the event.

product_version number The version of the product that generated the event.

proto string The OSI layer 3 (network layer) protocol--for example IP,ICMP, IPsec, ARP.

reason string The root cause of the result - "connection refused","timeout", "crash", etc.

recipient string The person to whom an email message is sent.

record_class string The DNS resource record class - IN (internet - default), HS(Hesiod - historic), or CH (Chaos - historic)

record_type string The DNS resource record type - see Wikipedia article onDNS record types

result string The result of the action - succeeded/failed, allowed/denied.

rule_number string The firewall rule-number or ACL number.

sender string The person responsible for sending an email message.

severity string The severity (or priority) of an event as reported by theoriginating device.

signature string The SID, as well as the signature identifiers used by otherIntrusion Detection Systems; the Event Identifiers assigned

18

Page 22: Splunk-4.1.7-Knowledge

by Windows-based operating systems to event records; andCisco's message IDs.

src_country string The country from which the packet was sent.

src_domain string The DNS domain that is being remotely queried.

src_host string The fully qualified host name of the system that transmittedthe packet. For Web logs, this is the http client.

src_int string The interface that is listening locally or sending packetsremotely.

src_ip ipv4 address The IPv4 address of the packet's source. For Web logs, thisis the http client.

src_ipv6 ipv6 address The IPv6 address of the packet's source.

src_lat number The (physical) latitude of the packet's source.

src_long number The (physical) longitude of the packet's source.

src_mac macaddress

The Media Access Control (MAC) address from which apacket was transmitted.

src_nt_domain string The Windows NT domain containing the machines thatgenerated the event.

src_nt_host string The Windows NT hostname of the system that generatedthe event.

src_port port The network port from which a packet originated.

src_record string The local DNS resource record being acted upon.

src_translated_ip ip address The translated/NAT'ed IP address from which a packet isbeing sent.

src_translated_port number The translated/NAT'ed network port from which a packet isbeing sent.

src_zone string The DNS zone that is being transferred by the master aspart of a zone transfer.

session_id string The session identifier. Multiple transactions build a session.

ssid string The 802.11 service set identifier (ssid) assigned to awireless network.

start_time timestamp The event's specified start time.

subject string The email subject line.

syslog_facility syslogfacility

The application, process, or OS subsystem that generatedthe event.

syslog_priority syslogpriority The criticality of an event, as recorded by UNIX syslog.

tcp_flags enumeration The TCP flag specified in the event. One or more of SYN,ACK, FIN, RST, URG, or PSH.

19

Page 23: Splunk-4.1.7-Knowledge

tos hex The hex bit that specifies TCP 'type of service' (seehttp://en.wikipedia.org/wiki/Type_of_Service).

transaction_id string The transaction identifier.

transport string The transport protocol, such as TCP, UDP.

ttl number The "Time To Live" of a packet or datagram.

url string A Web address (Uniform Record Locator, or URL) includedin a record.

user string The login ID affected by the recorded event.

user_group string A user group that is the object of an event, expressed inhuman-readable terms.

user_group_id string The numeric identifier assigned to the user group eventobject.

user_id number System-assigned numeric identifier for the user affected byan event.

user_privilege enumeration The security context associated with the object of an event:one of administrator, user, or guest/anonymous.

user_subject string User that is the subject of an event. The one executing theaction.

user_subject_id number ID number of the user that is the subject of an event. Theone executing the action.

user_subject_privilege enumeration The security context associated with a recorded event: oneof administrator, user, or guest/anonymous.

vendor string The vendor who made the product that generated the event.

vlan_id number The numeric identifier assigned to the virtual local areanetwork specified in the record.

vlan_name string The name assigned to the VLAN in the event.Standardize your event type tags

The Common Information Model suggests that you use a specific convention when tagging yourevent types. This convention requires that you set up three categories of tags, and that you give eachevent type in your system a single tag from each of these categories. The categories are object,action, and status.

This arrangement enables precise event type classification. The object tag denotes what the event isabout. What object has been targeted? Is the event talking about a host, a resource, a file, or what?The action tag explains what has been done to the object (create, delete, modify, and so on). Andthe status tag provides the status of the action. Was it successful? Failed? Or was it simply anattempt? In addition to these three standard tags, you can add other tags as well.

The three tags in discussion here are:

<objecttag> <actiontag> <statustag>

20

Page 24: Splunk-4.1.7-Knowledge

Some examples of using the standard tags are:

For a firewall deny event type:•

host communicate_firewall failure

For a firewall accept event :•

host communicate_firewall success

For a successful database login:•

database authentication_verify success

Object event type tags

Use one of these object tags in the first position as defined above.

Tag Explanation

application An application-level event.

application av An anti virus event.

application backdoor An event using an application backdoor.

application database A database event.

application database data An event related to database data.

application dosclient An event involving a DOS client.

application firewall An event involving an application firewall.

application im An instant message-related event.

application peertopeer A peer to peer-related event.

host A host-level event.

group A group-level event

resource An event involving system resources.

resource cpu An event involving the CPU.

resource file An event involving a file.

resources interface An event involving network interfaces.

resource memory An event involving memory.

resource registry An event involving the system registry.

os An OS-level event.

os process An event involving an OS-related process

os service An event involving an OS service.

user A user-level event

21

Page 25: Splunk-4.1.7-Knowledge

Action event type tags

Use one of these action tags in the second position as defined above.

Tag Explanation

access An event that accesses something.

access read An event that reads something.

access read copy An event that copies something.

access read copy archive An event that archives something.

access read decrypt An event that decrypts something.

access read download An event that downloads something.

access write An event that writes something.

authentication An event involving authentication.

authentication add An event adding authentication rules.

authentication delete An event deleting authentication rules.

authentication lock An event indicating an account lockout.

authentication modify An event modifying authentication rules.

authentication verify An event verifying identity.

authorization An event involving authorization.

authorization add Adding new priviliges.

authorization delete Deleting privileges.

authorization modify Changing privileges, e.g., chmod.

authorization verify Checking privileges for an operation.

check An event checking something.

check status An event checking something's status.

create An event that creates something.

communicate An event involving communication.

communicate connect An event involving making a connection.

communicate disconnect An event involving disconnecting.

communicate firewall An event passing through a firewall.

delete An event that deletes something.

execute An event that runs something.

execute restart An event that restarts something.

execute start An event that starts something.

execute stop An event that stops something.

22

Page 26: Splunk-4.1.7-Knowledge

modify An event that changes something.

modify attribute An event that changes an attribute.

modify attribute rename An event that renames something.

modify configuration An event that changes a configuration.

modify content A content-related event.

modify content append An event that appends new content onto existing content.

modify content clear An event that clears out content.

modify content insert An event that inserts content into existing content.

modify content merge An event that merges content.

substitute An event that replaces something.Status event type tags

Use one of these status tags in the third position as defined above.

Tag Explanation

attempt An event marking an attempt at something.

deferred A deferred event.

failure A failed event.

inprogress An event marking something progress.

report A report of a status.

success A successful event.Optional tags

For those who want to use standard additional tags when they apply, some suggestions are below.

Tag Explanation

attack An event marking an attack.

attack exploit An event marking the use of an exploit.

attack bruteforce An event marking a brute force attack.

attack dos An event marking a denial of service attack.

attack escalation An event indicating a privilege escalation attack.

infoleak An event indicating an information leak.

malware An event marking malware action.

malware dosclient An event marking malware utilizing a DOS client.

malware spyware An event marking spyware.

malware trojan An event marking a trojan.

malware virus An event marking a virus.

23

Page 27: Splunk-4.1.7-Knowledge

malware worm An event marking a worm.

recon An event marking recon probes.

suspicious An event indicating suspicious activity.Standardize your host tags

As you may know, it can be problematic to rename hosts directly. Because hosts are identified beforeevent data is indexed, changes to host names are not applied to data that has already been indexed.It's far easier to use tags to group together events from particular hosts.

You can use standardized tags to describe specific hosts and what they do. There are a variety ofapproaches to host tagging, all of which can be used where appropriate. Some of these methodsinclude:

What service(s) the host is running.• What OS the host is running.• The department the host belongs to.• What data the host contains.• What cluster/round robin the host belongs to.•

General host tags

These host tags are useful across the board. You can also develop lists of host tags that areappropriate for specific apps.

Tag Explanation

db This host is a database.

development This host is a development box.

dmz This host is in the DMZ.

dns This host is a DNS server.

email This host is an email server.

finance This host contains financial information.

firewall This host is a firewall.

highly_critical This host is highly critical for business purposes.

web This host is a Web server.

24

Page 28: Splunk-4.1.7-Knowledge

Data interpretation: Fields and field extractions

About fieldsAbout fields

Fields are searchable name/value pairings in event data. All fields have names and can be searchedwith those names. ("Name/value pairings" are sometimes referred to as "key/value pairings.")

For example, look at the following search:

host=fooIn this search, host=foo is a way of indicating that you are searching for events with host fieldsthat have values of foo. When you run this search, Splunk won't seek out events with different hostfield values. It also won't look for events containing other fields that share foo as a value. Thismeans that this search gives you a more focused set of search results than you might get if you justput foo in the search bar.

As Splunk processes event data, it extracts and defines fields from that data, first at index time, andagain at search time. These fields show up in the Field Picker after you run a search.

At index time Splunk extracts a small set of default fields for each event, including host, source,and sourcetype. Default fields are common to all events. Splunk can also extract custom indexedfields at index time; these are fields that you have configured for index-time extraction.

At search time Splunk automatically extracts certain fields. It:

automatically identifies and extracts the first 50 fields that it finds in the event data that matchobvious name/value pairs, such as user_id=jdoe or client_ip=192.168.1.1, which itextracts as examples of user_id and client_ip fields. (This 50 field limit is a default thatcan be modified by editing the [kv] stanza in limits.conf.)

extracts any field explicitly mentioned in the search that it might otherwise have found thoughautomatic extraction (but isn't among the first 50 fields identified).

performs custom search field extractions that you have defined, either through the InteractiveField Extractor, the Extracted fields page in Manager, configuration file edits, or searchcommands such as rex.

For an explanation of "search time" and "index time" see "Index time versus search time" in theAdmin manual.

An example of automatic field extraction

This is an example of how Splunk automatically extracts fields without user help (as opposed tocustom field extractions, which follow event-extraction rules that you define):

Say you search on sourcetype, a default field that Splunk automatically extracts for every event atindex time. If your search is

sourcetype=veeblefetzer

25

Page 29: Splunk-4.1.7-Knowledge

for the past 24 hours, Splunk retuns every event with a sourcetype of veeblefetzer in that timerange. From this set of events, Splunk automatically extracts the first 50 fields that it can identify onits own. And it performs extractions of custom fields, based on configuration files. All of these fieldswill appear in the Field Picker when the search is complete.

Now, if a name/value combination like userlogin=fail appears for the first time 25,000 eventsinto the search, and userlogin isn't among the set of custom fields that you've preconfigured, itlikely won't be among the first 50 fields that Splunk finds on its own.

However, if you change your search to

sourcetype=veeblefetzer userlogin=*Then Splunk will be smart enough to find and return all events including both the userlogin fieldand a sourcetype value of veeblefetzer, and it will be available in the Fields Picker along withthe other fields that Splunk has extracted for this search.

Add and maintain custom search fields

To fully utilize the power of Splunk IT search, however, you need to know how to create and maintaincustom search field extractions. Custom fields enable you to capture and track information that isimportant to your needs, but which isn't being discovered and extracted by Splunk automatically.

As a knowledge manager, you'll oversee the set of custom search field extractions created by usersof your Splunk implementation, and you may define specialized groups of custom search fieldsyourself. This section of the Knowledge Manager manual discusses the various methods of fieldcreation and maintenance (see the "Overview of search-time field extraction" topic) and providesexamples showing how this functionality can be used.

You'll learn how to:

create and administrate search-time field extractions through Splunk Manager.• design and manage search-time field transforms through Splunk Manager.• use the props.conf and transforms.conf configuration files to add and maintain search-timeextractions .

configure Splunk to parse multivalue fields.•

Overview of search-time field extractionOverview of search-time field extraction

This topic provides a brief overview of Splunk Web field extraction methods.

As you use Splunk, you will encounter situations that require the creation of new fields that will beadditions to the set of fields that Splunk automatically extracts for you at index time and searchtime.

26

Page 30: Splunk-4.1.7-Knowledge

As a knowledge manager, you'll be managing field extractions for the rest of your team. In manycases you'll be defining fields that Splunk has not identified on its own, in effort to make your eventdata more useful for searches, reports, and dashboards. However, you may also want to define fieldextractions as part of an event data normalizaton strategy, where you redefine existing fields andcreate new ones in an effort to reduce redundancies and increase the overall usability of the fieldsavailable to other Splunk users on their team. (For more information, see "Understand and use theCommon Information Model," in this manual.)

If you find that you need to create additional search-time field extractions, you have a number of waysto go about it. Splunk Web provides a variety of search-time field extraction methods. The searchlanguage also enables you to create temporary field extractions. And you can always add andmaintain field extractions by way of configuration file edits.

For a detailed discussion of search-time field addition using methods based in Splunk Web, see"Extract and add new fields" in the User manual. We'll just summarize the methods in this subtopicand provide links to topics with in-depth discussions and examples.

Use interactive field extraction to create new fields

You can create custom fields dynamically using the interactive field extractror (IFX) in Splunk Web.IFX enables you to quickly turn any search into a field extracting regular expression. You use IFX onthe local indexer. For more information about using IFX, see "Extract fields interactively in SplunkWeb" in the User manual.

Note: IFX is especially useful if you are not familiar with regular expression syntax and usage,because it will generate field extraction regexes for you (and enable you to test them).

To access IFX, run a search and then select "Extract fields" from the dropdown that appears beneathtimestamps in the field results. IFX enables you to extract only one field at a time (although you canedit the regex it generates later to extract multiple fields).

Use Splunk Manager to add and maintain field extractions

You can use the Field extractions and Field transformations pages in Splunk Manager to review, edit,and create extracted fields.

The Field extractions page

The Field extractions page shows you the search-time field extractions in props.conf. You can editexisting extractions and create new ones. The Field extractions page allows you to review, update,and create field extractions. You can use it to create and manage both basic "inline" search-timeextractions (extractions that are defined entirely within props.conf) and more advancedsearch-time extractions that reference a field transformation component in transforms.conf. You candefine field transformations in Manager through the Field transformations page (see below).

27

Page 31: Splunk-4.1.7-Knowledge

In Splunk Web, you navigate to the Field extractions page by selecting Manager > Fields > Fieldextractions.

For more information, see "Use the Field extractions page in Manager".

The Field transformations page

You can also use Manager to create more complex search-time field extractions that involve atransform component in transforms.conf. To do this, you couple an extraction from the Fieldextractions page with a field transform on the Field transformations page.

The Field transformations page displays search-time field transforms that have been defined intransforms.conf. Field transforms work with extractions set up in props.conf to enableadvanced field extractions. With transforms, you can define field extractions that

Reuse the same field-extracting regular expression across multiple sources, source types, orhosts (in other words, configure one field transform for multiple field extractions).

Apply more than one field-extracting regular expression to the same source, source type, orhost (in other words, apply multiple field transforms to the same field extraction).

Use a regular expression to extract fields from the values of another field (also referred to as a"source key").

In Splunk Web, you navigate to the Field transformations page by selecting Manager > Fields >Field transformations.

For more information, see "Use the Field transformations page in Manager".

Configure field extractions in props.conf and transforms.conf

You can also create and maintain field extractions by making edits directly to props.conf andtransforms.conf. If this sounds like your kind of thing--and it may be, especially if you are anold-timey Splunk user, or just prefer working at the configuration file level of things, you can find allthe details in "Create and maintain search-time extractions through configuration files," in thismanual.

It's important to note that the configuration files do enable you to do more things with search-timefield extractions than Manager currently does. For example with the config files you can you can setup:

Delimiter-based field extractions.• Extractions for multivalue fields.• Extractions of fields with names that begin with numbers or underscores (normally not allowedunless key cleaning is disabled).

Formatting of extracted fields.•

28

Page 32: Splunk-4.1.7-Knowledge

Use search commands to create field extractions

Splunk provides a variety of search commands that facilitate the extraction of fields in different ways.Here's a list of these commands:

The rex search command performs field extractions using a Perl regular expression withnamed groups named groups that you include in the search string.

The extract (or kv, for "key/value") search command extracts field/value pairs from searchresults. If you use extract without specifying any arguments, Splunk extracts fields usingfield extraction stanzas that have been added to props.conf. You can use extract to test anyfield extractions that you plan to add manually through conf files, to see if they extractfield/value information as expected.

Use multikv to extract field/value pairs from multiline, tabular-formatted events. It creates anew event for each table row and derives field names from the table title.

xmlkv enables you to extract field/value pairs from xml-formatted event data, such astransactions from webpages.

kvform extracts field/value pairs from events based on predefined form templates that describehow the values should be extracted. These templates are stored in$SPLUNK_HOME/etc/system/form/, or your own custom application directory in$SPLUNK_HOME/etc/apps/.../form. For example, if form=sales_order, Splunkmatches all of the events it processes against that form in an effort to extract values. WhenSplunk encounters an event with error_code=404, it looks for a sales_order.form file.

For details about how these commands are used, along with examples, see either the SearchReference or the "Extract and add new fields" topic in the User manual.

Use the Field extractions page in ManagerUse the Field extractions page in Manager

Use the Field extractions page in Manager to manage search-time field extractions that have beenadded to props.conf. Field extractions can be added to props.conf when you use the InteractiveField Extractor, through direct props.conf edits, and when you create field extractions through theField extractions page.

The Field extractions page enables you to:

Review the overall set of search-time extractions that you have created or which yourpermissions enable you to see, for all Apps in your instance of Splunk.

Create new search-time field extractions.• Update permissions for field extractions. Field extractions created through the Interactive FieldExtractor and the Field extractions page are initially only available to their creators until theyare shared with others.

Delete field extractions, if your app-level permissions enable you to do so, and if they are notdefault extractions that were delivered with the product. Default knowledge objects cannot bedeleted. For more information about deleting knowledge objects, see "Curate Splunkknowledge with Manager" in this manual.

29

Page 33: Splunk-4.1.7-Knowledge

If you have "write" permissions for a particular search-time field extraction, the Field extractionspage enables you to:

Update its regular expression, if it is an inline transaction.• Add or delete named extractions that have been defined in transforms.conf or the Fieldtransactions page in Manager, if it uses transactions.

Note: You cannot manage index-time field extractions via Manager. We don't recommend that youchange your set of index-time field extractions, but if you find that you must do so, you have to modifyyour props.conf and transforms.conf configuration files manually. For more information aboutindex-time field extraction configuration, see "Configure index-time field extractions" in the GettingData In manual.

Navigate to the Field extractions page by selecting Manager > Fields > Field extractions.

Review search-time field extractions in Manager

To better understand how the Field extractions page in Manager displays your field extraction, ithelps to understand how field extractions are set up in your props.conf and transforms.conffiles.

Field extractions can be set up entirely in props.conf, in which case they are identified on the Fieldextractions page as inline field extractions. But some field extractions include a transforms.confcomponent called a field transform. To create/edit that component of the field extraction via SplunkWeb, you use the Field transactions page in Manager.

For more information about transforms and the Field transforms page, see "Manage field transforms"in this manual.

For more information about field extraction setup directly in the props.conf and transforms.conf filessee "Add fields at search time" in this manual.

Name column

The Name column in the Field extractions page displays the overall name of the field extraction, as itappears in props.conf. The format is:

<spec> : [EXTRACT-<name> | REPORT-<name>]

<spec> can be:<sourcetype>, the source type of an event.♦ host::<host>, where <host> is the host for an event.♦ source::<source>, where <source> is the source for an event.♦

EXTRACT-<name> field extractions are extractions that are wholly defined in props.conf (in otherwords, they do not reference a transform in transforms.conf. They are created automatically by fieldextractions made through IFX and certain search commands. You can also add them by makingdirect updates to the props.conf file. This kind of extraction is always associated with a

30

Page 34: Splunk-4.1.7-Knowledge

field-extracting regular expression. On the Field extractions page, this regex appears in theExtraction/Transform column.

REPORT-<value> field extractions reference field transform stanzas in transforms.conf. This iswhere their field-extracting regular expressions are located. On the Field extractions page, thereferenced field transform stanza is indicated in the "Extraction/Transform" stanza.

You can work with transforms in Manager through the Field transforms page. For more informationsee "Use the Field Transformations page in Manager" in this manual.

Type column

There are two field extraction types: inline and transforms.conf.

Inline extractions always have EXTRACT-<name> configurations. They are identified as suchbecause they are entirely defined within </code>props.conf</code>; they do not referenceexternal field transforms.

Uses transform extractions always have REPORT-<value> name configurations. As such theyreference field transforms in </code>transforms.conf</code>. You can define field transformsdirectly in transforms.conf or via Manager using the Field transformations page.

Extraction Transform column

In the Extraction/Transform column, Manager displays different things depending on the fieldextraction Type.

For inline extraction types, Manager displays the regular expression that Splunk uses toextract the field. The named group (or groups) within the regex show you what field(s) itextracts.

For a primer on regular expression syntax and usage, see Regular-Expressions.info. You cantest your regex by using it in a search with the rex search command. Splunk also maintains alist of useful third-party tools for writing and testing regular expressions.

In the case of Uses transform extraction types, Manager displays the name of thetransforms.conf field transform stanza (or stanzas) that the field extraction is linked tothrough props.conf. A field extraction can reference multiple field transforms if you want toapply more than one field-extracting regex to the same source, source type, or host. This canbe necessary in cases where the field or fields that you want to extract appear in two or morevery different event patterns.

For example, the Expression column could display two values for a Uses transform extraction:access-extractions and ip-extractions. These may appear in props.conf as:

[access_combined]REPORT-access = access-extractions, ip-extractions

In this example, access-extractions and ip-extractions are both names of fieldtransform stanzas in transforms.conf. To work with those field transforms throughManager, go to the Field transforms page.

31

Page 35: Splunk-4.1.7-Knowledge

Add new field extractions

Click the New button at the top of the Field extractions page to add a new field extraction. The AddNew page appears.

If you know how field extractions are set up in props.conf, you should find this to be pretty simple.

All of the fields described below are required.

1. Define a Destination app context for the field extraction. By default it will be the app context youare currently in.

2. Give the field extraction a Name, using underscores for spaces between words. In props.confthis is the <name> value for an EXTRACT or REPORT field extraction type.

3. Define the sourcetype, source, or host to which the extraction applies. Select sourcetype, source,or host and enter the value. This maps to the <spec> value in props.conf.

4. Define the extraction type. If you select Uses transform enter the transform(s) involved in theExtraction/Transform field. If you select Inline enter the regular expression used to extract the field(or fields) in the Extraction/Transform field.

For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can testyour regex by using it in a search with the rex search command. Splunk also maintains a list of usefulthird-party tools for writing and testing regular expressions.

Important: The capturing groups in your regex must identify field names that only containalpha-numeric characters or underscores.

Valid characters for field names are a-z, A-Z, 0-9, or _ .• Field names cannot begin with 0-9 or _ . Leading underscores are reserved for Splunk'sinternal variables.

International characters are not allowed.•

Splunk applies the following "key cleaning" rules to all extracted fields, either by default or through acustom configuration:

All characters that are not in a-z, A-Z, and 0-9 ranges are replaced with an underscore (_).• All leading underscores and 0-9 characters are removed from extracted field names.•

To disable this behavior for a specific field extraction, you have to manually modify both props.confand transforms.conf. For more information, see "Create and maintain search-time fieldextractions through configuration files" in this manual.

Note: You cannot turn off key cleaning for inline field extractions (field extractions that do not requirea field transform component).

32

Page 36: Splunk-4.1.7-Knowledge

Example - Add a new error code field

This shows how you would define an extraction for a new err_code field. The field can be identifiedby the occurrence of device_id= followed by a word within brackets and a text string terminatingwith a colon. The field should be extracted from events related to the testlog source type.

In props.conf this extraction would look like:

[testlog]EXTRACT-<errors> = device_id=\[w+\](?<err_code>[^:]+)

Here's how you would set that up through the Add new field extractions page:

Note: You can find a version of this example in "Create and maintain search-time field extractions"topic in this manual, which shows you how to set up field extractions using the props.conf file.

Update existing field extractions

To edit an existing field extraction, click locate the field extraction and click its name in the Namecolumn.

This takes you to a details page for that field extraction. In the Extraction/Transform field what youcan do depends on the type of extraction that you are working with.

If the field extraction is an inline extraction, you can edit the regular expression it uses toextract fields.

33

Page 37: Splunk-4.1.7-Knowledge

If the field extraction uses one or more transforms, you can specify the transform or transformsinvolved (put them in a comma-separated list if there is more than one.) The transforms canthen be created or updated via the Field transforms page.

Note: Uses transform field extractions must include at least one valid transforms.conf fieldextraction stanza name.

Update field extraction permissions

When a field extraction is created through an inline method (such as IFX or a search command) it isinitially only available to its creator. To make it so that other users can use the field extraction, youneed to update its permissions. To do this, locate the field extraction on the Field extractions pageand select its Permissions link. This opens the standard permission management page used inmanager for knowledge objects.

On this page you can set up role-based permissions for the field extraction, and determine whether itis available to users of one specific App, or globally to users of all Apps. For more information aboutmanaging permissions with Manager, see "Curate Splunk knowledge with Manager," in this manual.

Delete field extractions

On the Field extractions page in Manager, you can delete field extractions if your permissions enableyou to do so. You won't be able to delete default field extractions (extractions that were delivered withthe product and which are stored in the "default" directory of an app).

Click Delete for the field extraction that you want to remove.

Note: Take care when deleting objects that have downstream dependencies. For example, if yourfield extraction is used in a search that in turn is the basis for an event type that is used by five othersaved searches (two of which are the foundation of dashboard panels), all of those other knowledgeobjects will be negatively impacted by the removal of that extraction from the system. For moreinformation about deleting knowledge objects, see "Curate Splunk knowledge with Manager" in thismanual.

Use the Field transformations page in ManagerUse the Field transformations page in Manager

The Field transformations page in Manager enables you to manage the "transform" components ofsearch-time field extractions, which reside in transforms.conf. Field transforms can be createdeither through direct edits to transforms.conf or by addition through the Field transformationspage.

Note: Every field transform has at least one field extraction component. But "inline" field extractionsdo not need to have a field transform component.

34

Page 38: Splunk-4.1.7-Knowledge

The Field transformations page enables you to:

Review the overall set of field transforms that you have created or which your permissionsenable you to see, for all Apps in your instance of Splunk.

Create new search-time field transforms. For more information about situations that call for theuse of field transforms, see "When to use the Field transformations page," below.

Update permissions for field transforms. Field transforms created through the Fieldtransformations page are initially only available to their creators until they are shared withothers. You can only update field transform permissions if you own the transform, or if yourrole's permissions enable you to do so.

Delete field transforms, if your app-level permissions enable you to do so, and if they are notdefault field transforms that were delivered with the product. Default knowledge objects cannotbe deleted. For more information about deleting knowledge objects, see "Curate Splunkknowledge with Manager" in this manual.

If you have "write" permissions for a particular field transform, the Field transformations page enablesyou to:

Update its regular expression and change the key the regular expression applies to.• Define or update the field transform format.•

Navigate to the Field transformations page by selecting Manager > Fields > Field transforms.

When to use the Field transformations page

While you can define most search-time field extractions entirely within props.conf (or the Fieldextractions page in Manager), some advanced search-time field extractions require atransforms.conf component called a field transform. This component can be defined andmanaged through the Field transforms page.

You set up search-time field extractions with a field transform component when you need to:

Reuse the same field-extracting regular expression across multiple sources, sourcetypes, or hosts (in other words, configure one field transform for multiple field extractions). Ifyou find yourself using the same regex to extract fields for different sources, source types, andhosts, you may want to set it up as a transform. Then, if you find that you need to update theregex, you only have to do so once, even though it is used more than one field extraction.

Apply more than one field-extracting regular expression to the same source, sourcetype, or host (in other words, apply multiple field transforms to the same field extraction). Thisis sometimes necessary in cases where the field or fields that you want to extract from aparticular source/source type/host appear in two or more very different event patterns.

Use a regular expression to extract fields from the values of another field (also referredto as a "source key"). For example, you might pull a string out of a url field value, and havethat be a value of a new field.

Note: All index-time field extractions are coupled with one or more field transforms. You cannotmanage index-time field extractions via Manager, however--you have to use the props.conf and

35

Page 39: Splunk-4.1.7-Knowledge

transforms.conf configuration files. For more information about index-time field extractionconfiguration, see "Configure index-time field extractions" in the Admin manual.

It's also important to note that you can do more things with search-time field transforms (such assetting up delimeter based field extractions and configuring extractions for multivalued fields) if youconfigure them directly within transforms.conf. See the section on field transform setup in"Create and maintain search-time field extractions through configuration files" in this manual for moreinformation.

Review and update search-time field transforms in Manager

To better understand how the Field transformation page in Manager displays your field transforms, ithelps to understand how search-time field extractions are set up in your props.conf andtransforms.conf files.

A typical field transform looks like this in transforms.conf:

[banner]REGEX = /js/(?<license_type>[^/]*)/(?<version>[^/]*)/login/(?<login>[^/]*)SOURCE_KEY = uri

This transform matches its regex against uri field values, and extracts three fields as named groups:license_type, version, and login.

In props.conf, that transform is matched to the source .../banner_access_log* like so:

[source::.../banner_access_log*]REPORT-banner = banner

This means the regex is only matched to uri fields in events coming from the.../banner_access_log source.

But you can match it to other sources, sourcetypes, and hosts if necessary. This is something youcan't do with inline field extractions (field extractions set up entirely within props.conf).

Note: By default, transforms are matched to a SOURCE_KEY value of _raw, in which case theirregexes are applied to the entire event, not just fields within that event.

The Name column

The Name column of the Field transformations page displays the names of the search-timetransforms that your permissions enable you to see. These names are the stanza names intransforms.conf. The transform example presented above appears in the list of transforms asbanner.

Click on a transform name to see the detail information for that particular transform.

Reviewing and editing transform details

The details page for a field transform enables you to view and update its regular expression, key, andevent format. Here's the details page for the banner transform that we described at the start of this

36

Page 40: Splunk-4.1.7-Knowledge

subtopic:

If you have the permissions to do so, you can edit the regex, key, and event format. Keep in mind thatthese edits can affect multiple field extractions defined in props.conf and the Field extractionspage, if the transform has been applied to more than one source, sourcetype, or host.

Create a new field transform

To create a new field transform:

1. First, navigate to the Field transformations page and click the New button.

2. Identify the Destination app for the field transform, if it is not the app you are currently in.

3. Give the field transform a Name. This equates to the stanza name for the transform ontransforms.conf. When you save this transform this is the name that appears in the Namecolumn on the Field transformations page. (This is a required field.)

4. Enter a Regular expression for the transform. (This is a required field.)

5. Optionally define a Key for the transform. This corresponds to the SOURCE_KEY option intransforms.conf. By default it is set to _raw, which means the regex is applied to entire events.To have the regex be applied to values of a specific field, replace _raw with the name of that field.You can only use fields that are present when the field transform is executed.

6. Optionally specify the Event format. This corresponds to the FORMAT option intransforms.conf. For example, you could have an event that contains strings for a field name andits corresponding field value. You first design a regex that extracts those strings, and then you usethe FORMAT of $1::$2 to have the first string be the field name, and the second string be the fieldvalue.

Regular expression syntax and usage

For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can testyour regex by using it in a search with the rex search command. Splunk also maintains a list of usefulthird-party tools for writing and testing regular expressions.

37

Page 41: Splunk-4.1.7-Knowledge

Important: The capturing groups in your regex must identify field names that contain alpha-numericcharacters or an underscore.

Valid characters for field names are a-z, A-Z, 0-9, or _ .• Field names cannot begin with 0-9 or _ . Leading underscores are reserved for Splunk'sinternal variables.

International characters are not allowed.•

Splunk applies the following "key cleaning" rules to all extracted fields when they are extracted atsearch-time, either by default or through a custom configuration:

1. All characters that are not in a-z, A-Z, and 0-9 ranges are replaced with an underscore (_).

2. When key cleaning is enabled (it is enabled by default), Splunk removes all leading underscoresand 0-9 characters from extracted fields.

To disable this behavior for a specific field extraction, you have to manually modify both props.confand transforms.conf. For more information, see "Create and maintain search-time fieldextractions through configuration files" in this manual.

Note: You cannot turn off key cleaning for inline field extractions (field extractions that do not requirea field transform component).

Example - Extract both field names and their corresponding field values from an event

You can use the Event format attribute in conjunction with a properly designed regular expression toset up field transforms that extracts both a field name and its corresponding field value from eachmatching event.

Here's an example, using a transform that is delivered with Splunk.

The bracket-space field transform has a regular expression that finds field name/value pairs withinbrackets in event data. It will reapply this regular expression until all of the matching field/value pairsin an event are extracted.

As we stated earlier in this topic, field transforms are always associated with a field extraction. On theField Extractions page in Manager, you can see that the bracket-space field transform is

38

Page 42: Splunk-4.1.7-Knowledge

associated with the osx-asl:REPORT-asl extraction.

Update field transform permissions

When a field transform is first created, by default it is only available to its creator. To make it so thatother users can use the field transform, you need to update its permissions. To do this, locate thefield transform on the Field transformations page and select its Permissions link. This opens thestandard permission management page used in Manager for knowledge objects.

On this page you can set up role-based permissions for the field transform, and determine whether itis available to users of one specific App, or globally to users of all Apps. For more information aboutmanaging permissions with Manager, see "Curate Splunk knowledge with Manager," in this manual.

Delete field transforms

On the Field transformations page in Manager, you can delete field transforms if your permissionsenable you to do so.

Click Delete for the field extraction that you want to remove.

Note: Take care when deleting knowledge objects that have downstream dependencies. Forexample, if the field extracted by your field transform is used in a search that in turn is the basis for anevent type that is used by five other saved searches (two of which are the foundation of dashboardpanels), all of those other knowledge objects will be negatively impacted by the removal of thattransform from the system. For more information about deleting knowledge objects, see "CurateSplunk knowledge with Manager" in this manual.

Create and maintain search-time field extractions throughconfiguration filesCreate and maintain search-time field extractions through configuration files

While you can now set up and manage search-time field extractions via Splunk Manager, it'simportant to understand how they are handled at the props.conf and transforms.conf level,because those are the configuration files that the Field extractions and Field transforms pages inManager read from and write to.

Many knowledge managers, especially those who have been using Splunk for some time, find iteasier to manage their custom fields through configuration files, which can be used to add, maintain,and review libraries of custom field additions for their teams.

This topic shows you how you can:

Set up basic "inline" search-time field extractions through edits to props.conf.•

39

Page 43: Splunk-4.1.7-Knowledge

Design more complex search-time field extractions through a combination of edits toprops.conf and transforms.conf.

Regular expressions and field name syntax

Splunk uses regular expressions, or regexes, to extract fields from event data. When you use theinteractive field extractor (IFX), Splunk attempts to generate regexes for you, but it can only createregular expressions that extract one field. If you set up your regular expressions manually, you candesign them so that they extract two or more fields from matching events if necessary.

On the other hand, when you set up field extractions manually through configuration files, you have toprovide the regex yourself--but you can set up regexes that extract two or more fields at once ifnecessary.

For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can testyour regex by using it in a search with the rex search command. Splunk also maintains a list of usefulthird-party tools for writing and testing regular expressions.

Important: The capturing groups in your regex must identify field names that contain alpha-numericcharacters or an underscore. See "When Splunk creates field names," above.

Use proper field name syntax

Splunk only accepts field names that contain alpha-numeric characters or an underscore:

Valid characters for field names are a-z, A-Z, 0-9, or _ .• Field names cannot begin with 0-9 or _ . Leading underscores are reserved for Splunk'sinternal variables.

International characters are not allowed.•

Splunk applies the following "key cleaning" rules to all extracted fields when they are extracted atsearch-time, either by default or through a custom configuration:

1. All characters that are not in a-z, A-Z, and 0-9 ranges are replaced with an underscore (_). Youcan disable this by setting CLEAN_KEYS=false in the transforms.conf stanza.

2. When key cleaning is enabled (it is enabled by default), Splunk removes all leading underscoresand 0-9 characters from extracted fields.

You can disable key cleaning for a particular field transform by setting CLEAN_KEYS=false in thetransforms.conf stanza for the extraction.

Create basic search-time field extractions with props.conf edits

You can create basic search-time field extractions by editing the props.conf configuration file. Youcan find props.conf in $SPLUNK_HOME/etc/system/local/, or your own custom application

40

Page 44: Splunk-4.1.7-Knowledge

directory in $SPLUNK_HOME/etc/apps/. (We recommend using the latter directory if you want tomake it easy to transfer your data customizations to other search servers.)

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

For more information on configuration files in general, see "About configuration files" in the Adminmanual.

Steps for defining basic custom field extractions with props.conf

1. All extraction configurations in props.conf are restricted by a specific source, source type, orhost. Start by identifying the sourcetype, source, or host that provide the events from which youwould like your field to be extracted.

Note: For information hosts, sources, and sourcetypes, see "About default fields (host, source,source type, and more)" in the Admin manual.

2. Determine a pattern to identify the field in the event.

3. Write a regular expression to extract the field from the event.

4. Add your regex to props.conf and link it to the source, source type, or host that you identified inthe first step.

5. If your field value is a portion of a word, you must also add an entry to fields.conf. See theexample "create a field from a subtoken" below.

Edit the props.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom applicationdirectory in $SPLUNK_HOME/etc/apps/.

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

5. Restart Splunk for your changes to take effect.

Add a regex stanza to props.conf

Follow this format when adding a field extraction stanza to props.conf:

[<spec>]EXTRACT-<class> = <regular expression>

<spec> can be:<sourcetype>, the source type of an event.♦ host::<host>, where <host> is the host for an event.♦ source::<source>, where <source> is the source for an event.♦

<class> is the extraction class. Precedence rules for classes:For each class, Splunk takes the configuration from the highest precedenceconfiguration block.

If a particular class is specified for a source and a sourcetype, the class for sourcewins out.

41

Page 45: Splunk-4.1.7-Knowledge

Similarly, if a particular class is specified in ../local/ for a <spec>, it overrides thatclass in ../default/.

<regular_expression> = create a regex that recognizes your custom field value. Theregex is required to have named capturing groups; each group represents a different extractedfield.

Note: Unlike the procedure for configuring the default set of fields that Splunk extracts at index time,transforms.conf requires no DEST_KEY since nothing is being written to the index duringsearch-time field extraction. Fields extracted at search time are not persisted in the index as keys.

Note: For "inline" search-time field extractions, which are defined entirely within props.conf,props.conf uses EXTRACT-<class>. When transforms are involved, this changes. Search-timefield extractions using transforms use REPORT-<value> (see the section on complex fieldextractions for more info). And index-time field extractions, which are always constructed both inprops.conf and transforms.conf, use TRANSFORMS-<value>.

Splunk follows precedence rules when it runs search-time field extractions. It runs inline fieldextractions (EXTRACT-<class>) first, and then runs field extractions that involve transforms(REPORT-<class>).

Inline (props.conf only) search-time field extraction examples

Here are a set of examples of search-time custom field extraction, set up using props.conf only.

Add a new error code field

This example shows how to create a new "error code" field by configuring a field extraction inprops.conf. The field can be identified by the occurrence of device_id= followed by a wordwithin brackets and a text string terminating with a colon. The field should be extracted from eventsrelated to the testlog source type.

In props.conf, add:

[testlog]EXTRACT-<errors> = device_id=\[w+\](?<err_code>[^:]+)

Extract multiple fields using one regex

This is an example of a field extraction that pulls out five separate fields. You can then use thesefields in concert with some event types to help you find port flapping events and report on them.

Here's a sample of the event data that the fields are being extracted from:

#%LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet9/16,changed state to down

The stanza in props.conf for the extraction looks like this:

[syslog]

42

Page 46: Splunk-4.1.7-Knowledge

EXTRACT-<port_flapping> = Interface\s(?<interface>(?<media>[^\d]+)(?<slot>\d+)\/(?<port>\d+))\,\schanged\sstate\sto\s(?<port_status>up|down)

Note that five separate fields are extracted as named groups: interface, media, slot, port, andport_status.

The following two steps aren't required for field extraction--they show you what you might do with theextracted fields to find port flapping events and then report on them.

Use tags to define a couple of event types in eventtypes.conf:

[cisco_ios_port_down]search = "changed state to down"

[cisco_ios_port_up]search = "changed state to up"

Finally, create a saved search in savedsearches.conf that ties much of the above together to find portflapping and report on the results:

[port flapping]search = eventtype=cisco_ios_port_down OR eventtype=cisco_ios_port_up starthoursago=3 | stats count by interface,host,port_status | sort -count

Create a field from a subtoken

If your field value is a smaller part of a token, you must add an entry to field.conf. For example, yourfield's value is "123" but it occurs as "foo123" in your event.

Configure props.conf as explained above. Then, add an entry to fields.conf:

[<fieldname>]INDEXED = FalseINDEXED_VALUE = False

Fill in <fieldname> with the name of your field.For example, [url] if you've configured a field named "url."♦

Set INDEXED and INDEXED_VALUE to false.This tells Splunk that the value you're searching for is not a token in the index.♦

Create advanced search-time field extractions with field transforms

While you can define most search-time field extractions entirely within props.conf, some advancedsearch-time field extractions require an additional component called a field transform. This sectionshows you how to configure field transforms in transforms.conf.

Field transforms contain a field-extracting regular expression and other attributes that govern the waythe transform extracts fields. Field transforms are always created in conjunction with field extractionstanzas in props.conf--they cannot stand alone.

43

Page 47: Splunk-4.1.7-Knowledge

Your search-time field extractions require a field transform component if you need to:

Reuse the same field-extracting regular expression across multiple sources, sourcetypes, or hosts (in other words, configure one field transform for multiple field extractions). Ifyou find yourself using the same regex to extract fields for different sources, source types, andhosts, you may want to set it up as a transform. Then, if you find that you need to update theregex, you only have to do so once, even though it is used more than one field extraction.

Apply more than one field-extracting regular expression to the same source, sourcetype, or host (in other words, apply multiple field transforms to the same field extraction). Thisis sometimes necessary in cases where the field or fields that you want to extract from aparticular source/source type/host appear in two or more very different event patterns.

Set up delimiter-based field extractions. Delimiter-based extractions come in handy whenyour event data presents field-value pairs (or just field values) that are separated by delimiterssuch as commas, colons, bars, line breaks, and tab spaces.

Configure extractions for multivalued fields. When you do this, Splunk appends additionalfield values to the field as it finds them in the event data.

Extract fields with names that begin with numbers or underscores. Ordinarily keycleaning removes leading numeric characters and underscores from field names, but you canconfigure your transform to turn this functionality off if necessary.

You can also configure transforms to:

Extract fields from the values of another field (other than _raw) by using the SOURCE_KEYattribute.

Apply special formatting to the information being extracted, by using the FORMAT attribute.•

Both of these configurations can now be set up directly in the regex, however; see the "Define a fieldtransform" section below for more information about how to do this.

NOTE: If you need to concatenate a set of regex extractions into a single field value, you can do thiswith the FORMAT attribute, but only if you set it up as an index-time extraction. For example, if youhave a string like 192(x)0(y)2(z)1 in your event data, you can extract it at index time as an ipaddress field value in the format 192.0.2.1. For more information, see "Configure index-time fieldextractions" in the Admin Manual. However we DO NOT RECOMMEND that you make extensivechanges to your set of indexed fields--do so sparingly if at all.

Steps for defining custom search-time field extractions with field transforms

1. All extraction configurations in props.conf are restricted by a specific source, source type, orhost. Start by identifying the source, source type, or host that provide the events from which youwould like your field to be extracted.

Note: For more information about sources, source types, or hosts, see "About default fields (host,source, sourcetype, and more)" in the Admin manual.

2. Determine a pattern to identify the field in the event.

3. Define a regular expression that uses this pattern to extract the field from the event. (Note: If yourevent lists field/value pairs or just field values, you can create a delimiter-based field extraction that

44

Page 48: Splunk-4.1.7-Knowledge

won't require a regex; see the information on the DELIMS attribute, below, for more information.)

4. Create a field transform in transforms.conf that utilizes this regex (or delimiter configuration).The transform can also define a source key and/or event value formatting.

Edit the transforms.conf file in $SPLUNK_HOME/etc/system/local/, or your own customapplication directory in $SPLUNK_HOME/etc/apps/.

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

5. In props.conf, create a field extraction stanza that is linked to the host, source, or source typethat you identified in step 1. Add a reference to the transform you defined in transforms.conf.(Create additional field extraction stanzas for other hosts, sources, and source types that refer to thesame transform if necessary.)

Edit the props.conf file in $SPLUNK_HOME/etc/system/local/, or your own custom applicationdirectory in $SPLUNK_HOME/etc/apps/.

Note: Do not edit files in $SPLUNK_HOME/etc/system/default/.

6. Restart Splunk for your changes to take effect.

First, define a field transform

Follow this format when defining a search-time field transform in transforms.conf:

[<unique_stanza_name>]REGEX = <regular expression>SOURCE_KEY = <string>FORMAT = <string>DELIMS = <quoted string list>FIELDS = <quoted string list>MV_ADD = <bool>CLEAN_KEYS = <bool>

The <unique_stanza_name> is required for all search-time transforms.• REGEX is a regular expression that operates on your data to extract fields. It is required for allsearch-time field transforms unless you are setting up a delimiter-based transaction, in whichcase you use DELIMS instead.

Name-capturing groups in the REGEX are extracted directly to fields, which means thatyou don't have to specify FORMAT for simple field extraction cases.

If the REGEX extracts both the field name and its corresponding value, you can use thefollowing special capturing groups to skip specifying the mapping in FORMAT:

<_KEY_><string>, <_VAL_><string>.

For example, the following are equivalent:•

Using FORMAT:

REGEX = ([a-z]+)=([a-z]+)

45

Page 49: Splunk-4.1.7-Knowledge

FORMAT = $1::$2

Not using FORMAT:

REGEX = (?<_KEY_1>[a-z]+)=(?<_VAL_1>[a-z]+)

SOURCE_KEY is optional. Use it to identify a field whose values the transform regex should beapplied to.

By default, SOURCE_KEY is set to _raw, which means it is applied to the entire event.♦ For search-time transforms, the key can be any field that is present at the time that thefield transform is executed.

FORMAT is optional. Use it to specify the format of the field/value pair(s) that you are extracting,including any field names or values you want to add. You don't need to specify the FORMAT ifyou have a simple REGEX with name-capturing groups.

Defaults to an empty string.♦ For search-time transforms, this is the pattern for the FORMAT field:♦

FORMAT = <field-name>::<field-value>(<field-name>::<field-value>)*where:field-name = <string>|$<extracting-group-number>field-value = <string>|$<extracting-group-number>

Examples of search-time FORMAT usage:1. FORMAT = first::$1 second::$2 third::other-value2. FORMAT = $1::$2 $4::$3

DELIMS is optional. Use it in place of REGEX when dealing with delimiter-based fieldextractions, where field values--or field/value pairs--are separated by delimiters such ascommas, colons, spaces, tab spaces, line breaks, and so on.

Delimiters must be quoted with " " (use \ to escape).♦ If the event contains full delimiter-separated field/value pairs, you enter two sets ofquoted delimiters for DELIMS. The first set of quoted delimiters separates the field/valuepairs. The second set of quoted delimiters separates the field name from itscorresponding value.

If the events only contain delimiter-separated values (no field names), you use one setof quoted delimiters, to separate the values. Then you use the FIELDS attribute toapply field names to the extracted values (see FIELDS below). Alternately, Splunkreads even tokens as field names and odd tokens as field values.

Splunk consumes consecutive delimiter characters unless you specify a list of fieldnames.

This example of DELIMS usage applies to an event where field/value pairs areseparated by '|' symbols, and the field names are separated from their correspondingvalues by '=' symbols:

[pipe_eq]DELIMS = "|", "="

46

Page 50: Splunk-4.1.7-Knowledge

FIELDS is used in conjunction with DELIMS when you are performing delimiter-based fieldextraction, but you only have field values to extract. Use FIELDS to provide field names for theextracted field values, in list format according to the order in which the values are extracted.

Note: If field names contain spaces or commas they must be quoted with " " (to escape,use \).

Here's an example of a delimiter-based extraction where three field values appear in anevent. They are separated by a comma and then a space.

[commalist]DELIMS = ", "FIELDS = field1, field2, field3

MV_ADD is optional. Use it when you have events that repeat the same field but with differentvalues. When MV_ADD = true, Splunk makes any field that is used more than once in anevent (but with different values) a multivalued field and appends each value it finds for thatfield.

When set to false, Splunk keeps the first value found for a field in an event anddiscards every subsequent value found for that same field in that same event.

CLEAN_KEYS is optional. It controls whether or not the system strips leading underscores and0-9 characters from the field names it extracts (see the subtopic "Use proper field namesyntax," above, for more information).

By default, CLEAN_KEYS is always set to true for transforms.♦ Add CLEAN_KEYS = false to your transform if you need to extract field names (keys)with leading underscores and/or 0-9 characters.

Second, configure a field extraction and associate it with the field transform

Follow this format when you're associating a search-time field transform with a field extraction stanzain props.conf. <unique_transform_stanza_name> is the name of the field transform stanzathat you are associating with the field extraction.

You can associate multiple field transform stanzas to a single field extraction by listing them after theinitial <unique_transform_stanza_name>, separated by commas. (For more information, seethe example later in this topic.)

[<spec>]REPORT-<value> = <unique_transform_stanza_name>

<spec> can be:<sourcetype>, the source type of an event.♦ host::<host>, where <host> is the host for an event.♦ source::<source>, where <source> is the source for an event.♦

<class> is the extraction class. Precedence rules for classes:For each class, Splunk takes the configuration from the highest precedenceconfiguration block.

If a particular class is specified for a source and a sourcetype, the class for sourcewins out.

Similarly, if a particular class is specified in ../local/ for a <spec>, it overrides thatclass in ../default/.

47

Page 51: Splunk-4.1.7-Knowledge

<unique_transform_stanza_name> is the name of your field transform stanza fromtransforms.conf.

<value> is any value you want to give to your stanza to identify its name-space.• Transforms are applied in the specified order.• If you need to change the order, control it by rearranging the list.•

Note: Index-time field transactions use TRANSFORM-<value> =<unique_transform_stanza_name>. For more information, see "Configure index-time fieldextractions" in the Admin Manual.

Examples of custom search-time field extractions using field transforms

These examples present custom field extraction use cases that require you to configure one or morefield transform stanzas in transforms.conf and then reference them in a props.conf fieldextraction stanza.

Configuring a field extraction that utilizes multiple field transforms

This example of search-time field transform setup demonstrates how:

you can create transforms that pull varying field name/value pairs from events.• you can create a field extraction that references two or more field transforms.•

Let's say you have logs that contain multiple field name/field value pairs. While the fields vary fromevent to event, the pairs always appear in one of two formats.

The logs often come in this format:

[fieldName1=fieldValue1] [fieldName2=fieldValue2]

However, at times they are more complicated, logging multiple name/value pairs as a list, in whichcase the format looks like:

[headerName=fieldName1] [headerValue=fieldValue1], [headerName=fieldName2][headerValue=fieldValue2]

Note that the list items are separated by commas, and that each fieldName is matched with acorresponding fieldValue. In these secondary cases you still want to pull out the field names andvalues so that the search results are

fieldName1=fieldValue1fieldName2=fieldValue2

and so on.

To make things more clear, here's an example of an HTTP request event that combines both of theabove formats.

[method=GET] [IP=10.1.1.1] [headerName=Host][headerValue=www.example.com], [headerName=User-Agent]

48

Page 52: Splunk-4.1.7-Knowledge

[headerValue=Mozilla], [headerName=Connection] [headerValue=close][byteCount=255]

You want to develop a single field extraction that would pull the following field/value pairs from thatevent:

method=GETIP=10.1.1.1Host=www.example.comUser-Agent=MozillaConnection=closebyteCount=255

Solution

To efficiently and reliably pull out both formats of field/value pairs, you'll want to design two differentregexes that are optimized for each format. One regex will identify events with the the first format andpull out all of the matching field/value pairs. The other regex will identify events with the other formatand pull out those field/value pairs.

You then create two unique transforms in transforms.conf--one for each regex--and then unitethem in the corresponding field extraction stanza in props.conf.

The first transform you add to transforms.conf catches the fairly conventional<code>[fieldName1=fieldValue1] [fieldName2=fieldValue2]</code> case.

[myplaintransform]REGEX=\[(?!(?:headerName|headerValue))([^\s\=]+)\=([^\]]+)\]FORMAT=$1::$2

The second transform (also added to transforms.conf) catches the slightly more complex[headerName=fieldName1] [headerValue=fieldValue1], [headerName=fieldName2][headerValue=fieldValue2] case:

[mytransform]REGEX= \[headerName\=(\w+)\],\s\[headerValue=([^\]]+)\]FORMAT= $1::$2

Both transforms use the <fieldName>::<fieldValue> FORMAT to match each field name in theevent with its corresponding value. This setting in FORMAT enables Splunk to keep matching theregex against a matching event until every matching field/value combination is extracted.

Finally, this field extraction stanza, which you create in props.conf, references both of the fieldtransforms:

[mysourcetype]KV_MODE=noneREPORT-a = mytransform, myplaintransform

Note that, besides using multiple field transforms, the field extraction stanza also setsKV_MODE=none. This disables automatic field/value extraction for the identified source type (whileletting your manually defined extractions continue). It ensures that these new regexes aren't

49

Page 53: Splunk-4.1.7-Knowledge

overridden by automatic field extraction, and it also helps increase your search performance. (Seethe following subsection for more on disabling key/value extraction.)

Configuring delimiter-based field extraction

You can use the DELIMS attribute in field transforms to configure field extractions for events wherefield values or field/value pairs are separated by delimiters such as commas, colons, tab spaces, andmore.

For example, say you have a recurring multiline event where a different field/value pair sits on aseparate line, and each pair is separated by a colon followed by a tab space. Here's a sample event:

ComponentId: Application ServerProcessId: 5316ThreadId: 00000000ThreadName: P=901265:O=0:CTSourceId: com.ibm.ws.runtime.WsServerImplClassName: MethodName: Manufacturer: IBMProduct: WebSphereVersion: Platform 7.0.0.7 [BASE 7.0.0.7 cf070942.55]ServerName: sfeserv36Node01Cell\sfeserv36Node01\server1TimeStamp: 2010-04-27 09:15:57.671000000UnitOfWork: Severity: 3Category: AUDITPrimaryMessage: WSVR0001I: Server server1 open for e-businessExtendedMessage:

Now you could set up a bulky, wordy search-time field extraction stanza in props.conf that handlesall of these fields:

[activityLog]LINE_BREAKER = [-]{8,}([\r\n]+)SHOULD_LINEMERGE = falseEXTRACT-ComponentId = ComponentId:\t(?.*)EXTRACT-ProcessId = ProcessId:\t(?.*)EXTRACT-ThreadId = ThreadId:\t(?.*)EXTRACT-ThreadName = ThreadName:\t(?.*)EXTRACT-SourceId = SourceId:\t(?.*)EXTRACT-ClassName = ClassName:\t(?.*)EXTRACT-MethodName = MethodName:\t(?.*)EXTRACT-Manufacturer = Manufacturer:\t(?.*)EXTRACT-Product = Product:\t(?.*)EXTRACT-Version = Version:\t(?.*)EXTRACT-ServerName = ServerName:\t(?.*)EXTRACT-TimeStamp = TimeStamp:\t(?.*)EXTRACT-UnitOfWork = UnitOfWork:\t(?.*)EXTRACT-Severity = Severity:\t(?.*)EXTRACT-Category = Category:\t(?.*)EXTRACT-PrimaryMessage = PrimaryMessage:\t(?.*)EXTRACT-ExtendedMessage = ExtendedMessage:\t(?.*)

But that solution is pretty over-the-top. Is there a more elegant way to handle it that would remove theneed for all these EXTRACT lines? Yes!

50

Page 54: Splunk-4.1.7-Knowledge

Configure the following stanza in transforms.conf:

[activity_report]DELIMS = "\n", ":\t"

This states that the field/value pairs in the event are on separate lines ("\n"), and then specifies thatthe field name and field value on each line is separated by a colon and tab space (":\t").

To complete this configuration, rewrite the wordy props.conf stanza mentioned above as:

[activitylog]LINE_BREAKER = [-]{8,}([\r\n]+)SHOULD_LINEMERGE = falseREPORT-activity = activity_report

These two brief configurations will extract the same set of fields as before, but they leave less roomfor error and are more flexible.

Handling events with multivalued fields

You can use the MV_ADD attribute to extract fields in situations where the same field is used morethan once in an event, but has a different value each time. Ordinarily, Splunk only extracts the firstoccurrence of a field in an event; every subsequent occurrence is discarded. But when MV_ADD isset to true in transforms.conf, Splunk treats the field like a multivalue field and saves extractseach unique field/value pair in the event.

Say you have a set of events that look like this:

event1.epochtime=1282182111 type=type1 value=value1 type=type3 value=value3event2.epochtime=1282182111 type=type2 value=value4 type=type3 value=value5 type=type4 value=value6

See how the type and value fields are repeated several times in each event? What you'd like to dois search type=type3 and have both of these events be returned. Or you'd like to run acount(type) report on these two events that returns 5.

So, what you want to do is create a custom multivalue extraction of the type field for these events.Here's how you would set up your transforms.conf and props.conf files to enable it:

First, transforms.conf:

[mv-type]REGEX = type=(?<type>\s+)MV_ADD = true

Then, in props.conf for your sourcetype or source, set:

REPORT=type = mv-type

51

Page 55: Splunk-4.1.7-Knowledge

Disabling automatic search-time extraction for specific sources, source types, or hosts

You can disable automatic search-time field extraction for specific sources, source types, or hoststhrough edits in props.conf. Add KV_MODE = none for the appropriate [<spec>] inprops.conf.

Note: Custom field extractions set up manually via the configuration files or Manager will still beprocessed for the affected source, source type, or host when KV_MODE = none.

[<spec>]KV_MODE = none

<spec> can be:

<sourcetype> - an event source type.• host::<host>, where <host> is the host for an event.• source::<source>, where <source> is the source for an event.•

Configure multivalue fieldsConfigure multivalue fields

Multivalue fields are fields that can appear multiple times in an event and have a different value foreach appearance. One of the more common examples of multivalue fields is that of email addressfields, which typically appears two to three times in a single sendmail event--once for the sender,another time for the list of recipients, and possibly a third time for the list of Cc addresses, if oneexists. If all of these fields are labeled identically (as "AddressList," for example), they lose meaningthat they might otherwise have if they're identified separately as "From", "To", and "Cc".

Splunk parses multivalue fields at search time, and enables you to process the values in the searchpipeline. Search commands that work with multivalue fields include makemv, mvcombine, mvexpand,and nomv. For more information on these and other commands see the topic on multivalue fields inthe User manual, and the Search Reference manual.

Use the TOKENIZER key to configure multivalue fields in fields.conf. TOKENIZER uses a regularexpression to tell Splunk how to recognize and extract multiple field values for a recurring field in anevent. Edit fields.conf in $SPLUNK_HOME/etc/system/local/, or your own customapplication directory in $SPLUNK_HOME/etc/apps/.

For more information on configuration files in general, see "About configuration files" in the Adminmanual.

For a primer on regular expression syntax and usage, see Regular-Expressions.info. You can testregexes by using them in searches with the rex search command. Splunk also maintains a list ofuseful third-party tools for writing and testing regular expressions.

52

Page 56: Splunk-4.1.7-Knowledge

Configure a multivalue field via fields.conf

Define a multivalue field by adding a stanza for it in fields.conf. Then add a line with theTOKENIZER key and a corresponding regular expression that shows how the field can have multiplevalues.

Note: If you have other attributes to set for a multivalue field, set them in the same stanzaunderneath the TOKENIZER line. See the fields.conf topic in the Admin manual for more information.

[<field name 1>]TOKENIZER = <regular expression>

[<field name 2>]TOKENIZER = <regular expression>

<regular expression> should indicate how the field in question can take on multiplevalues.

TOKENIZER defaults to empty. When TOKENIZER is empty, the field can only take on a singlevalue.

Otherwise the first group is taken from each match to form the set of field values.• The TOKENIZER key is used by the where, timeline, and stats commands. It also provides thesummary and XML outputs of the asynchronous search API.

Note: Tokenization of indexed fields (fields extracted at index time) is not supported. If you have setINDEXED=true for a field, you cannot also use the TOKENIZER key for that field. You can use asearch-time extraction defined in props.conf and transforms.conf to break an indexed fieldinto multiple values.

Example

The following examples from $SPLUNK_HOME/etc/system/README/fields.conf.examplebreak email fields To, From, and CC into multiple values.

[To]TOKENIZER = (\w[\w\.\-]*@[\w\.\-]*\w)

[From]TOKENIZER = (\w[\w\.\-]*@[\w\.\-]*\w)

[Cc]TOKENIZER = (\w[\w\.\-]*@[\w\.\-]*\w)

53

Page 57: Splunk-4.1.7-Knowledge

Data classification: Event types and transactions

About event typesAbout event types

Event types are a categorization system to help you make sense of your data. Event types let you siftthrough huge amounts of data, find similar patterns, and create alerts and reports.

Events versus event types

An event is a single record of activity within a log file. An event typically includes a timestamp andprovides information about what occurred on the system being monitored or logged.

An event type is a user-defined field that simplifies search by letting you categorize events. Eventtypes let you classify events that have common characteristics. When your search results come back,they're checked against known event types. An event type is applied to an event at search time if thatevent matches the event type definition in eventtypes.conf. Tag or save event types after indexingyour data.

Event type classification

There are several ways to create your own event types. Define event types via Splunk Web orthrough configuration files, or you can save any search as an event type. When saving a search asan event type, you may want to use the punct field to craft your searches. The punct field helps younarrow down searches based on the structure of the event.

Use the punct field to search on similar events

Because the format of an event is often unique to an event type, Splunk indexes the punctuationcharacters of events as a field called punct. The punct field stores the first 30 punctuationcharacters in the first line of the event. This field is useful for finding similar events quickly.

When you use punct, keep in mind:

Quotes and backslashes are escaped.• Spaces are replaced with an underscore (_).• Tabs are replaced with a "t".• Dashes that follow alphanumeric characters are ignored.• Interesting punctuation characters are:•

",;-#$%&+./:=?@\\'|*\n\r\"(){}<>[]^!"

The punct field is not available for events in the _audit index because those events are signedusing PKI at the time they are generated.

54

Page 58: Splunk-4.1.7-Knowledge

For an introduction to the punct field and other methods of event classification, see "Classify andgroup similar events" topic in the User manual.

Punct examples

This event:

####<Jun 3, 2005 5:38:22 PM MDT> <Notice> <WebLogicServer> <bea03><asiAdminServer> <WrapperStartStopAppMain> <>WLS Kernel<> <> <BEA-000360><Server started in RUNNING mode>

Produces this punctuation:

####<_,__::__>_<>_<>_<>_<>_<>_

This event:

172.26.34.223 - - [01/Jul/2005:12:05:27 -0700] "GET/trade/app?action=logout HTTP/1.1" 200 2953

Produces this punctuation:

..._-_-_[:::_-]_\"_?=_/.\"__

Event type discovery

Pipe any search to the typelearner command and create event types directly from Splunk Web. Thefile eventdiscoverer.conf is mostly deprecated, although you can still specify terms to ignore whenlearning new event types in Splunk Web.

Create new event types

The simplest way to create a new event type is through Splunk Web. Save an event type much in thesame way you save a search. For more information, see "Define and maintain event types in SplunkWeb" in this manual.

Create new event types by modifying eventtypes.conf. For more about saving searches as eventtypes, see the "Classify and group similar events" topic in the User manual.

Event type tags

Tag event types to organize your data into categories. There can be multiple tags per event. Formore information about event type tagging, see the "Tag event types" topic in this manual

55

Page 59: Splunk-4.1.7-Knowledge

Configuration files for event types

Event types are stored in eventtypes.conf.

Terms for event type discovery are set in eventdiscoverer.conf.

Define and maintain event types in Splunk WebDefine and maintain event types in Splunk Web

Any search that does not involve a pipe operator or a subsearch can be saved as an event type. Asingle event can match multiple event types.

Any event types you create through Splunk Web are automatically added to eventtypes.conf in$SPLUNK_HOME/etc/users/<your-username>/<app>/local/, where <app> is the app youwere in when you created the event type. If you change the permissions on the event type to make itavailable to all users (either in the app, or globally to all apps), Splunk moves the event type to$SPLUNK_HOME/etc/apps/<App>/local/.

Save a search as an event type

To save a search as an event type:

Enter the search and run it.• Select the Actions... dropdown and click Save as event type...•

The Save Event Type dialog box pops up, pre-populated with your search terms.

Name the event type.• Optionally add one or more tags for the event type, comma-separated.• Click Save.•

You can now use your event type in searches. If you named your event type foo, you'd use it in asearch like this:

eventtype=fooAutomatically find and build event types

Unsure whether you have any interesting event types in your IT data? Splunk provides utilities thatdynamically and intelligently locate and create useful event types:

Find event types: The findtypes search command analyzes a given set of events andidentifies common patterns that could be turned into potentially useful event types.

Build event types: The Build Event Type utility enables you to dynamically create eventtypes based on events returned by searches.

56

Page 60: Splunk-4.1.7-Knowledge

Find event types

To use the event type finder, add this to the end of your search:

...| findtypesSearches that use the findtypes command return a breakdown of the most common groups ofevents found in the search results. They are:

hierarchically ordered in terms of "coverage" (frequency). This helps you easily identify kindsof events that are subsets of larger event groupings.

coupled with searches that can be used as the basis for event types that will help you locatesimilar events.

By default, findtypes returns the top 10 potential event types found in the sample, in terms of thenumber of events that match each kind of event discovered. You can increase this number by addinga max argument: findtypes max=30

Splunk also indicates whether or not the event groupings discovered with findtypes have alreadybeen associated with other event types.

Note: The findtypes command analyzes 5000 events at most to return these results. You canlower this number using the head command for a more efficient search:

...| head 1000 | findtypesTest potential searches before saving them as event types

When you identify a potentially useful event grouping, test the search associated with it to see if itreturns the results you want. Click Test for the event grouping in which you are interested in to see itsassociated search run in a separate window. After the search runs, review the results it returns todetermine whether or not it is capturing the specific information you want.

57

Page 61: Splunk-4.1.7-Knowledge

Save a tested search as an event type

When you find a search that returns the right collection of results, save it as an event type by clickingSave for the event grouping with which it is associated. The Save Event Type dialog appears. Enter aname for the event type, and optionally identify one or more tags that should be associated with it,separated by commas. You can also edit the search if necessary.

Build event types

If you find an event in your search results that you'd like to base an event type on, open thedropdown event menu (find the down arrow next to the event timestamp) and click Build event type.Splunk takes you to the Build Event Type utility. You can use this utility to design a search thatreturns a select set of events, and then create an event type based on that search.

The Build Event Type utility finds a set of sample events that are similar to the one you selected fromyour search results. In the Event type features sidebar, you'll find possible field/value pairings thatyou can use to narrow down the event type search further.

The Build Event Type utility also displays a search string under Generated event type at the top ofthe page. This is the search that the event type you're building will be based upon. As you selectother field/value pairs in the Event type features sidebar, the Generated event type updates toinclude those selections. The list of sample events updates as well, to reflect the kinds of events thatthe newly modified event type search would return.

If you want to edit the event type search directly, click Edit. This brings up the Edit Event Type dialog,which you can use to edit the search string.

Test potential searches before saving them as event types

When you build a search that you think might be a useful event type, test it. Click Test to see thesearch run in a separate window.

Save a tested search as an event type

If you test a search and it looks like it's returning the correct set of events, you can click Save to saveit as an event type. The Save Event Type dialog appears. Enter a name for the event type, andoptionally identify one or more tags that should be associated with it, separated by commas. You canalso edit the search if necessary.

Add and maintain event types in Manager

The Event Types page in Manager enables you to view and maintain details of the event types thatyou have created or which you have permission to edit. You can also add new event types throughthe Event Types page. Event types displayed on the Event Types page may be available globally(system-wide) or they may apply to specific Apps.

Adding an event type in Manager

To add an event type through Manager, navigate to the Event Types page and click New. Splunktakes you to the Add New event types page.

58

Page 62: Splunk-4.1.7-Knowledge

From this page you enter the new event type's Destination App, Name, and the Search string thatultimately defines the event type (see "Save a search as an event", above).

Note: All event types are initially created for a specific App. To make a particular event type availableto all users on a global basis, you have to locate the event type on the Event Types page, click itsPermissions link, and change the This app only selection to All apps.

You can optionally include Tags for the event type. For more information about tagging event typesand other kinds of Splunk knowledge, see "About tags and aliases" in this manual.

You can also optionally select a Priority for the event type, where 1 is the highest priority and 10 isthe lowest. The Priority setting is important for common situations where you have events that fit twoor more event types. When the event turns up in search results, Splunk displays the event typesassociated with the event in a specific order. You use the Priority setting to ensure that certain eventtypes take precedence over others in this display order.

If you have a number of overlapping event types, or event types that are subsets of larger ones, youmay want to give the precisely focused event types a higher priority. For example, you could easilyhave a set of events that are part of a wide-ranging system_error event type. Within that large setof events, you could have events that also belong to more precisely focused event types likecritical_disc_error and bad_external_resource_error.

In a situation like this, you could give the system_error event type a Priority of 10, while giving theother two error codes Priority values in the 1 to 5 range. This way, when events that match bothsystem_error and critical_disc_error appear in search results, thecritical_disc_error event type is always listed ahead of the system_error event type.

Maintaining event types in Manager

To update the details of an event type, locate it in the list on the Event Types page in Manager, andclick its name. Splunk takes you to the details page for the event type, where you can edit the Searchstring, Tags, and Priority for the event type, if you have the permissions to do so. You can alsoupdate permissions for event types and delete event types through the Event Types page, if you haveedit permissions for them.

59

Page 63: Splunk-4.1.7-Knowledge

Configure event types directly in eventtypes.confConfigure event types directly in eventtypes.conf

You can add new event types and update existing event types by configuring eventtypes.conf. Thereare a few default event types defined in$SPLUNK_HOME/etc/system/default/eventtypes.conf. Any event types you create throughSplunk Web are automatically added to$SPLUNK_HOME/etc/system/local/eventtypes.conf.

Configuration

Make changes to event types in eventtypes.conf. Use$SPLUNK_HOME/etc/system/README/eventtypes.conf.example as an example, or createyour own eventtypes.conf.

Edit eventtypes.conf in $SPLUNK_HOME/etc/system/local/, or your own custom applicationdirectory in $SPLUNK_HOME/etc/apps/. For more information on configuration files in general, see"About configuration files" in the Admin manual.

[$EVENTTYPE]

Header for the event type• $EVENTTYPE is the name of your event type.

You can have any number of event types, each represented by a stanza and anynumber of the following attribute/value pairs.

♦ •

Note: If the name of the event type includes field names surrounded by the percent character (e.g.%$FIELD%) then the value of $FIELD is substituted at search time into the event type name for thatevent. For example, an event type with the header [cisco-%code%] that has code=432 becomeslabeled [cisco-432].

disabled = <1 or 0>

Toggle event type on or off.• Set to 1 to disable.•

search = <string>

Search terms for this event type.• For example: error OR warn.•

tags = <string>

Space separated words that are used to tag an event type.•

description = <string>

60

Page 64: Splunk-4.1.7-Knowledge

Optional human-readable description of the event type.•

priority = <integer>

Splunk uses this value to determine the order in which it displays matching event types for anevent. 1 is the highest, and 10 is the lowest.

Note: You can tag eventtype field values the same way you tag any other field/value combination.See the tags.conf spec file for more information.

Example

Here are two event types; one is called web, and the other is called fatal.

[web]search = html OR http OR https OR css OR htm OR html OR shtml OR xls OR cgi

[fatal]search = FATAL

Disable event types

Disable an event type by adding disabled = 1 to the event type stanza eventtypes.conf:

[$EVENTTYPE]disabled = 1

$EVENTTYPE is the name of the event type you wish to disable.

So if you want to disable the web event type, add the following entry to its stanza:

[web]disabled = 1

Configure event type templatesConfigure event type templates

Event type templates create event types at search time. Define event type templates ineventtypes.conf. Edit eventtypes.conf in $SPLUNK_HOME/etc/system/local/, or your owncustom application directory in $SPLUNK_HOME/etc/apps/.

For more information on configuration files in general, see "About configuration files" in the Adminmanual.

61

Page 65: Splunk-4.1.7-Knowledge

Event type template configuration

Event type templates use a field name surrounded by percent characters to create event types atsearch time where the %$FIELD% value is substituted into the name of the event type.

[$NAME-%$FIELD%]$SEARCH_QUERY

So if the search query in the template returns an event where %$FIELD%=bar, Splunk creates anevent type titled $NAME-bar for that event.

Example

[cisco-%code%]search = cisco

If a search on "cisco" returns an event that has code=432, Splunk creates an event type titled"cisco-432".

About transactionsAbout transactions

A transaction is any group of conceptually related events that spans time. A transaction type is aconfigured transaction, saved as a field in Splunk. Any number of data sources can generatetransactions over multiple log entries.

For example, a customer shopping in an online store could generate a transaction across multiplesources. Web access events might share a session ID with the event in the application server log; theapplication server log might contain the account ID, transaction ID, and product ID; the transaction IDmay live in the message queue with a message ID, and the fulfillment application may log themessage ID along with the shipping status. All of this data represents a single user transaction.

Here are some other examples of transactions:

Web access events• Application server events• Business transactions• E-mails• Security violations• System failures•

Transaction search

Transaction search is useful for a single observation of any physical event stretching over multiplelogged events. Use the transaction command to define a transaction or override transaction optionsspecified in transactiontypes.conf.

To learn more, read "Search for transactions" in this manual.

62

Page 66: Splunk-4.1.7-Knowledge

Configure transaction types

You may want to persist the transaction search you've created. Or you might want to create a lastingtransaction type. You can save transactions by editing transactiontypes.conf. Definetransactions by creating a stanza and listing specifications.

To learn more about configuring transaction types, read "Define transactions" in this manual.

When to use stats instead of transactions

Transactions aren't the most efficient method to compute aggregate statistics on transactional data. Ifyou want to compute aggregate statistics over transactions that are defined by data in a single field,use the stats command.

For example, if you wanted to compute the statistics of the duration of a transaction defined by thefield session_id:

* | stats min(_time) AS earliest max(_time) AS latest by session_id | evalduration=latest-earliest | stats min(duration) max(duration) avg(duration)median(duration) perc95(duration)Similary, if you wanted to compute the number of hits per clientip in an access log:

sourcetype=access_combined | stats count by clientip | sort -countAlso, if you wanted to compute the number of distinct session (parameterized by cookie) perclientip in an access log:

sourcetype=access_combined | stats dc(cookie) as sessions by clientip |sort -sessionsRead the stats command reference for more information about using the search command.

Search for transactionsSearch for transactions

Search for transactions using the transaction search command either in Splunk Web or at the CLI.The transaction command yields groupings of events which can be used in reports. To usetransaction, either call a transaction type (that you configured via transactiontypes.conf), or definetransaction constraints in your search by setting the search options of the transaction command.

Search options

Transactions returned at search time consist of the raw text of each event, the shared event types,and the field values. Transactions also have additional data that is stored in the fields: duration andtransactiontype.

duration contains the duration of the transaction (the difference between the timestamps ofthe first and last events of the transaction).

transactiontype is the name of the transaction (as defined in transactiontypes.confby the transaction's stanza name).

63

Page 67: Splunk-4.1.7-Knowledge

You can add transaction to any search. For best search performance, craft your search and thenpipe it to the transaction command. For more information see the topic on the transactioncommand in the Search Reference manual.

Follow the transaction command with the following options. Note: Some transaction optionsdo not work in conjunction with others.

[field-list]

This is a comma-separated list of fields, such as ...|transaction host,cookie• If set, each event must have the same field(s) to be considered part of the same transaction.• Events with common field names and different values will not be grouped.

For example, if you add ...|transaction host, then a search result that hashost=mylaptop can never be in the same transaction as a search result withhost=myserver.

A search result that has no host value can be in a transaction with a result that hashost=mylaptop.

match=closest

Specify the matching type to use with a transaction definition.• The only value supported currently is closest.•

maxspan=[<integer> s|m|h|d]

Set the maximum pause between the events in a transaction.• Can be in seconds, minutes, hours or days.

For example: 5s, 6m, 12h or 30d.♦ •

Defaults to maxspan=-1, for an "all time" timerange.•

maxpause=[<integer> s|m|h|d]

Specifies the maximum pause between transactions.• Requires there be no pause between the events within the transaction greater than maxpause.• If the value is negative, the maxspause constraint is disabled.• Defaults to maxpause=-1.•

startswith=<string>

A search or eval-filtering expression which, if satisfied by an event, marks the beginning of anew transaction.

For example:startswith="login"♦ startswith=(username=foobar)♦ startswith=eval(speed_field < max_speed_field)♦ startswith=eval(speed_field < max_speed_field/12)♦

Defaults to "".•

endswith=<transam-filter-string>

64

Page 68: Splunk-4.1.7-Knowledge

A search or eval-filtering expression which, if satisfied by an event, marks the end of atransaction.

For example:endswith="logout"♦ endswith=(username=foobar)♦ endswith=eval(speed_field < max_speed_field)♦ endswith=eval(speed_field < max_speed_field/12)♦

Defaults to "".•

For startswith and endswith, <transam-filter-string> is defined with the followingsyntax: "<search-expression>" | (<quoted-search-expression>) |eval(<eval-expression>

<search-expression> is a valid search expression that does not contain quotes.• <quoted-search-expression> is a valid search expression that contains quotes.• <eval-expression> is a valid eval expression that evaluates to a boolean.•

Examples:

search expression: (name="foo bar")• search expression: "user=mildred"• search expression: ("search literal")• eval bool expression: eval(distance/time < max_speed)•

Transactions and macro search

Transactions and macro searches are a powerful combination that allow substitution into yourtransaction searches. Make a transaction search and then save it with $field$ to allow substitution.

For an example of how to use macro searches and transactions, see "Create and use searchmacros" in the User manual. For more information about macro searches, see "Design macrosearches" in this manual.

Example transaction search

Run a search that groups together all of the web pages a single user (or client IP address)looked at over a time range.

This search takes events from the access logs, and creates a transaction from events that share thesame clientip value that occurred within 5 minutes of each other (within a 3 hour time span).

sourcetype=access_combined | transaction clientip maxpause=5m maxspan=3h

Define transactionsDefine transactions

Any series of events can be turned into a transaction type. Read more about use cases in "Abouttransactions", in this manual.

You can create transaction types via transactiontypes.conf. See below for configuration details.

65

Page 69: Splunk-4.1.7-Knowledge

For more information on configuration files in general, see "About configuration files" in the Adminmanual.

Configure transaction types in transactiontypes.conf

1. Create a transactiontypes.conf file in $SPLUNK_HOME/etc/system/local/, or your owncustom application directory in $SPLUNK_HOME/etc/apps/.

2. Define transactions by creating a stanza and listing specifications for each transaction within itsstanza. Use the following attributes:

[<transactiontype>]maxspan = [<integer> s|m|h|d]maxpause = [<integer> s|m|h|d]fields = <comma-separated list of fields>exclusive = <true | false>match = closest

[<TRANSACTIONTYPE>]

Create any number of transaction types, each represented by a stanza name and any numberof the following attribute/value pairs.

Use the stanza name, [<TRANSACTIONTYPE>], to search for the transaction in Splunk Web.• If you do not specify an entry for each of the following attributes, Splunk uses the default value.•

maxspan = [<integer> s|m|h|d]

Set the maximum time span for the transaction.• Can be in seconds, minutes, hours or days.

For example: 5s, 6m, 12h or 30d.♦ •

Defaults to 5m.•

maxpause = [<integer> s|m|h|d]

Set the maximum pause between the events in a transaction.• Can be in seconds, minutes, hours or days.

For example: 5s, 6m, 12h or 30d.♦ •

Defaults to 2s.•

fields = <comma-separated list of fields>

If set, each event must have the same field(s) to be considered part of the same transaction.• Defaults to "".•

exclusive = <true | false>

Toggle whether events can be in multiple transactions, or 'exclusive' to a single transaction.• Applies to 'fields' (above).•

66

Page 70: Splunk-4.1.7-Knowledge

For example, if fields=url,cookie, and exclusive=false, then an event with a'cookie', but not a 'url' value could be in multiple transactions that share the same 'cookie', buthave different URLs.

Setting exclusive = false causes the matcher to look for multiple matches for each eventand approximately doubles the processing time.

Defaults to "true".•

match = closest

Specify the match type to use.• Currently, the only value supported is "closest."• Defaults to "closest."•

3. Use the transaction command in Splunk Web to call your defined transaction (by its transactiontype name). You can override configuration specifics during search.

For more information about searching for transactions, see "Search for transactions" in this manual.

67

Page 71: Splunk-4.1.7-Knowledge

Data enrichment: Lookups and workflow actions

About lookups and workflow actionsAbout lookups and workflow actions

Lookups and workflow actions enable you to enrich and extend the usefulness of your event datathrough interactions with external resources.

Lookup tables

Lookup tables use information in your events to determine how to add other fields from external datasources such as static tables (CSV files) and Python-based commands. It's also possible to createlookups that add fields based on time information.

A really basic example of this functionality would be a static lookup that takes the http_statusvalue in an event, matches that value with its definition in a CSV file, and then adds that definition tothe event as the value of a new status_description field. So if you have an event wherehttp_status = 503 the lookup would add status_description = Service Unavailable,Server Error to that event.

Of course, there are more advanced ways to work with lookups. For example, you can:

Arrange to have a static lookup table be populated by the results of a saved search.• Define a field lookup that is based on an external Python script rather than a lookup table. Forexample, you could create a lookup that uses a Python script that returns an IP address whengiven a host name, and returns a host name when given an IP address.

Create a time-based lookup, if you are working with a lookup table that includes a field valuethat represents time. For example, this could come in handy if you need to use DHCP logs toidentify users on your network based on their IP address and the event timestamp.

For more information, see "Lookup fields from external data sources," in this chapter.

Workflow actions

Workflow actions enable you to set up interactions between specific fields in your data and otherapplications or web resources. A really simple workflow action would be one that is associated with aIP_address field, which, when launched, opens an external WHOIS search in a separate browserwindow based on the IP_address value.

You can also set up workflow actions that:

Apply only to particular fields (as opposed to all fields in an event).• Apply only to events belonging to a specific event type or group of event types.• Are accessed either via event dropdown menus, field dropdown menus, or both.• Perform HTTP GET requests, enabling you to pass information to an external web resource,such as a search engine or IP lookup service.

Perform HTTP POST requests that can send field values to an external resource. Forexample, you could design one that sends a status value to an external issue-tracking

68

Page 72: Splunk-4.1.7-Knowledge

application.Take certain field values from a chosen event and insert them into a secondary search that ispopulated with those field values and which launches in a secondary browser window.

For information about setting workflow actions up in Manager, see "Create workflow actions in SplunkWeb", in this chapter.

Look up fields from external data sourcesLook up fields from external data sources

Use the dynamic fields lookup feature to add fields to your events with information from an externalsource, such as a static table (CSV file) or an external (Python) command. You can also add fieldsbased on matching time information.

For example, if you are monitoring logins with Splunk and have IP addresses and timestamps forthose logins in your Splunk index, you can use a dynamic field lookup to map the IP address andtimestamp to the MAC address and username information for the matching IP and timestamp datathat you have in your DHCP logs.

You can set up a lookup using the Lookups Manager page in Splunk Web or by configuring stanzasin props.conf and transforms.conf. For more information about using the Lookups Manager,see the fields lookup tutorial in the User Manual. This topic walks discusses how to use props.confand transforms.conf to set up your lookups.

To set up a lookup using the configuration files:

Important: Do not edit conf files in $SPLUNK_HOME/etc/system/default. Instead, you shouldedit the file in $SPLUNK_HOME/etc/system/local/ or$SPLUNK_HOME/etc/apps/<app_name>/local/. If the file doesn't exist, create it.

1. Edit transforms.conf to define your lookup table.

Currently you can define two kinds of lookup tables: static lookups (which utilize CSV files) andexternal lookups (which utilize Python scripts). The arguments you use in your transforms stanzaindicate the type of lookup table you want to define. Use filename for static lookups andexternal_cmd for external lookups.

Note: A lookup table must have at least two columns. Each column may have multiple instances ofthe same value (multi-valued fields).

2. Edit props.conf to apply your lookup table.

This step is the same for both static and external lookups. In this configuration file, you specify thefields to match and output (or outputnew, if you don't want to overwrite the output field) from thelookup table that you defined in transforms.conf.

You can have more than one field lookup defined in a single source stanza. Each lookup should haveit's own unique lookup name; for example, if you have multiple tables, you can name them:

69

Page 73: Splunk-4.1.7-Knowledge

LOOKUP-table1, LOOKUP-table2, etc., or something more descriptive.

When you add a lookup to props.conf, the lookup is run automatically. If your automatic lookup is veryslow, it will also impact the speed of your searches.

3. Restart Splunk to implement the changes you made to the configuration files.

After restart, you should see the output fields from your lookup table listed in the fields picker. Fromthere, you can select the fields to display in each of the matching search results.

Set up a fields lookup based on a static file

The simplest fields lookup is based on a static table, specifically a CSV file. The CSV file needs to belocated in one of two places:

$SPLUNK_HOME/etc/system/lookups/• $SPLUNK_HOME/etc/apps/<app_name>/lookups/•

Create the lookups directory if it does not exist.

1. Edit transforms.conf to define your lookup table.

In transforms.conf, add a stanza to define your lookup table. The name of the stanza is also thename of your lookup table. You will use this transform in props.conf.

In this stanza, reference the CSV file's name:

[myLookup]filename = <filename>max_matches = <integer>

Optionally, you can specify the number of matching entries to apply to an event; max_matchesindicates that the first (in file order) <integer> number of entries are used. By default,max_matches is 100 for lookups that are not based on a timestamp field.

2. Edit props.conf to apply your lookup table.

In props.conf, add a stanza with the lookup key. This stanza specifies the lookup table that youdefined in transforms.conf and indicates how Splunk should apply it to your events:

[<stanza name>]lookup-<name> = $TRANSFORM <match_field_in_table> OUTPUT|OUTPUTNEW <output_field_in_table>

stanza name is the sourcetype, host, or source to which this lookup applies, as specified inprops.conf.

stanza name can't use regex-type syntax.• $TRANSFORM references the stanza in transforms.conf where you defined your lookuptable.

70

Page 74: Splunk-4.1.7-Knowledge

match_field_in_table is the column in the lookup table that you use to match values.• output_field_in_table is the column in the lookup table that you add to your events. UseOUTPUTNEW if you don't want to overwrite existing values in your output field.

You can have multiple columns on either side of the lookup. For example, you could have$TRANSFORM <match_field1>, <match_field2> OUTPUT|OUTPUTNEW<match_field3>, <match_field4>. You can also have one field return two fields, threefields return one field, and so on.

Use the AS clause if the field names in the lookup table and your events do not match or if you wantto rename the field in your event:

[<stanza name>]lookup_<name> = $TRANSFORM <match_field_in_table> AS <match_field_in_event> OUTPUT|OUTPUTNEW <output_field_in_table> AS <output_field_in_event>

You can have more than one field after the OUTPUT|OUTPUTNEW clause. If you don't useOUTPUT|OUTPUTNEW, Splunk adds all the field names and values from the lookup table to yourevents.

3. Restart Splunk.

Example of static fields lookup

Here's an example of setting up lookups for HTTP status codes in an access_combined log. In thisexample, you want to match the status field in your lookup table (http_status.csv) with the fieldin your events. Then, you add the status description and status type fields into your events.

The following is the http_status.csv file. You can put this into$SPLUNK_HOME/etc/apps/<app_name>/lookups/. If you're using this in the Search App, putthe file into $SPLUNK_HOME/etc/apps/search/lookups/:

status,status_description,status_type100,Continue,Informational101,Switching Protocols,Informational200,OK,Successful201,Created,Successful202,Accepted,Successful203,Non-Authoritative Information,Successful204,No Content,Successful205,Reset Content,Successful206,Partial Content,Successful300,Multiple Choices,Redirection301,Moved Permanently,Redirection302,Found,Redirection303,See Other,Redirection304,Not Modified,Redirection305,Use Proxy,Redirection307,Temporary Redirect,Redirection400,Bad Request,Client Error401,Unauthorized,Client Error402,Payment Required,Client Error403,Forbidden,Client Error404,Not Found,Client Error405,Method Not Allowed,Client Error

71

Page 75: Splunk-4.1.7-Knowledge

406,Not Acceptable,Client Error407,Proxy Authentication Required,Client Error408,Request Timeout,Client Error409,Conflict,Client Error410,Gone,Client Error411,Length Required,Client Error412,Precondition Failed,Client Error413,Request Entity Too Large,Client Error414,Request-URI Too Long,Client Error415,Unsupported Media Type,Client Error416,Requested Range Not Satisfiable,Client Error417,Expectation Failed,Client Error500,Internal Server Error,Server Error501,Not Implemented,Server Error502,Bad Gateway,Server Error503,Service Unavailable,Server Error504,Gateway Timeout,Server Error505,HTTP Version Not Supported,Server Error

1. In a transforms.conf file located in either $SPLUNK_HOME/etc/system/local/ or$SPLUNK_HOME/etc/apps/<app_name>/local, put:

[http_status]filename = http_status.csv

2. In a props.conf file, located in either $SPLUNK_HOME/etc/system/local/ or$SPLUNK_HOME/etc/apps/<app_name>/local/, put:

[access_combined]lookup_http = http_status status OUTPUT status_description, status_type

3. Restart Splunk.

Now, when you run a search that returns Web access information, you will see the fieldsstatus_description and status_type listed in your fields picker menu.

Use search results to populate a lookup table

You can edit a local or app-specific copy of savedsearches.conf to use the results of a savedsearch to populate a lookup table.

In a saved search stanza, where the search returns a results table:

1. Add the following line to enable the lookup population action.

action.populate_lookup = 1

This tells Splunk to save your results table into a CSV file.

2. Add the following line to tell Splunk where to copy your lookup table.

action.populate_lookup.dest = <string>

72

Page 76: Splunk-4.1.7-Knowledge

The action.populate_lookup.dest value is a lookup name from transforms.conf or apath to a CSV file where Splunk should copy the search results. If it is a path to a CSV file, thepath should be relative to $SPLUNK_HOME.

For example, if you want to save the results to a global lookup table, you might include:

action.populate_lookup.dest = etc/system/lookups/myTable.csv

The destination directory, $SPLUNK_HOME/etc/system/lookups or$SPLUNK_HOME/etc/<app_name>/lookups, should already exist.

3. Add the following line if you want this search to run when Splunk starts up.

run_on_startup = true

If it does not run on startup, it will run at the next scheduled time. Generally, we recommend that youset this to true for scheduled searches that populate lookup tables.

Because Splunk copies the results of the saved search to a CSV file, you can set up your fieldslookup the same way you set up a static lookup.

Set up a fields lookup based on an external command or script

For dynamic or external lookups, your transforms.conf stanza references the command or scriptand arguments to invoke. This is also called a scripted or external lookup.

You can also specify the type of command or script to invoke:

[myLookup]external_cmd = <string>external_type = pythonfields_list = <string>max_matches = <integer>

Use fields_list to list all the fields supported by the external command, delimited by a commaand space.

Note: Currently, Splunk only supports Python scripts for external lookups. Python scripts used forthese lookups must be located in one of two places:

$SPLUNK_HOME/etc/apps/<app_name>/bin• $SPLUNK_HOME/etc/searchscripts•

Note: When writing your Python script, if you refer to any external resources (such as a file), thereference must be relative to the directory where the script is located.

Example of external fields lookup

Here's an example of how you might use external lookups to match with information from a DNSserver. Splunk ships with a script located in $SPLUNK_HOME/etc/system/bin/ calledexternal_lookup.py, which is a DNS lookup script that:

73

Page 77: Splunk-4.1.7-Knowledge

if given a host, returns the IP address.• if given an IP address, returns the host name.•

1. In a transforms.conf file, put:

[dnsLookup]external_cmd = external_lookup.py host ipfields_list = host, ip

2. In a props.conf file, put:

[access_combined]lookup_dns = dnsLookup host OUTPUT ip AS clientip

The field in the lookup table is named ip, but Splunk automatically extracts the IP addresses fromWeb access logs into a field named clientip. So, "OUTPUT ip AS clientip" indicates that you wantSplunk to add the values of ip from the lookup table into the clientip field in the events. Since thehost field has the same name in the lookup table and the events, you don't need to rename the field.

For a reverse DNS lookup, your props.conf stanza would be:

[access_combined]lookup_rdns = external_lookup.py ip AS clientip OUTPUTNEW host AS hostname

For this example, instead of overwriting the host field value, you want Splunk to return the hostvalue in a new field, called hostname

3. Restart Splunk.

More about the external lookup script

When designing your external lookup script, keep in mind that it needs to take in a partially emptyCSV file and output a filled-in CSV file. The arguments that you pass to the script are the headers forthese input and output files.

In the DNS lookup example above, the CSV file contains 2 fields, "host" and "ip". The fields that youpass to this script are the ones you specify in transforms.conf:

external_cmd = external_lookup.py host ip

Note: If you don't pass these arguments, the script will return an error.

When you run the search command:

... | lookup dnsLookup hostYou're telling Splunk to use the lookup table that you defined in transforms.conf as [dnsLookup]and pass into the external command script the values for the "host" field as a CSV file, which maylook like this:

host,ipwork.com

74

Page 78: Splunk-4.1.7-Knowledge

home.net

Basically, this is a CSV file with the header "host" and "ip", but missing values for ip. The two headersare included because they are the fields you specified in the fields_list parameter oftransforms.conf.

The script then outputs the following CSV file and returns it to Splunk, which populates the ip field inyour results:

host,ipwork.com,127.0.0.1home.net,127.0.0.2

Set up a time-based fields lookup

If your static or external lookup table has a field value that represents time, you can use this time fieldto set up your fields lookup. For time-based (or temporal) lookups, add the following lines to yourlookup stanza in transforms.conf:

time_field = <field_name>time_format = <string>

If time_field is present, by default max_matches is 1. Also, the first matching entry in descendingorder is applied.

Use the time_format key to specify the strptime format of your time_field. By default,time_format is UTC.

For a match to occur with time-based lookups, you can also specify offsets for the minimum andmaximum amounts of time that an event may be later than a lookup entry. To do this, add thefollowing lines to your stanza:

max_offset_secs = <integer>min_offset_secs = <integer>

By default, there is no maximum offset and the minimum offset is 0.

Example of time-based fields lookup

Here's an example of how you might use DHCP logs to identify users on your network based on theirIP address and the timestamp. Let's say the DHCP logs are in a file, dhcp.csv, which contains thetimestamp, IP address, and the user's name and MAC address.

1. In a transforms.conf file, put:

[dhcpLookup]filename = dhcp.csvtime_field = timestamptime_format = %d/%m/%y %H:%M:%S

2. In a props.conf file, put:

75

Page 79: Splunk-4.1.7-Knowledge

[dhcp]lookup_table = dhcpLookup ip mac OUTPUT user

3. Restart Splunk.

Troubleshooting lookups - Using identical names in lookup stanzas

Lookup table definitions are indicated with the attribute, LOOKUP-<name>. In general it's best if all ofyour lookup stanzas have different names to reduce the chance of things going wrong. When you dogive the same name to two or more lookups you can run into trouble unless you know what you'retrying to do:

If two or more lookups with the same name share the same stanza (the same host, source, orsourcetype) the first lookup with that stanza in fields.conf overrides the others. All lookupswith the same host, source, or sourcetype should have different names.

If you have lookups with different stanzas (different hosts, sources, or sourcetypes) that sharethe same name, you can end up with a situation where only one of them seems to work at anygiven point in time. You may set this up on purpose, but in most cases it's probably not veryconvenient.

For example, if have two lookups that share "table" as their:

[host::machine_name]LOOKUP-table = logs_per_day host OUTPUTNEW average_logs AS logs_per_day

[sendmail]LOOKUP-table = location host OUTPUTNEW building AS location

Any events that overlap between these two lookups will only be affected by one of them. In otherwords:

events that match the host will get the host lookup.• events that match the sourcetype will get the sourcetype lookup.• events that match both will only get the host lookup.•

When you name your lookup LOOKUP-table, you're saying this is the lookup that achieves somepurpose or action described by "table". In this example, these lookups are intended to achievedifferent goals--one determines something about logs per day, and the other has something to dowith location. You might instead rename them:

[host::machine_name]LOOKUP-table = logs_per_day host OUTPUTNEW average_logs AS logs_per_day

[sendmail]LOOKUP-location = location host OUTPUTNEW building AS location

Now you have two different settings that won't collide.

76

Page 80: Splunk-4.1.7-Knowledge

Create workflow actions in Splunk WebCreate workflow actions in Splunk Web

Enable a wide variety of interactions between indexed fields and other web resources with workflowactions. Workflow actions have a wide variety of applications. For example, you can define workflowactions that enable you to:

Perform an external WHOIS lookup based on an IP address found in an event.• Use the field values in an HTTP error event to create a new entry in an external issuemanagement system.

Perform an external search (using Google or a similar web search application) on the value ofa specific field found in an event.

Launch secondary Splunk searches that use one or more field values from selected events.•

In addition, you can define workflow actions that:

Are targeted to events that contain a specific field or set of fields, or which belong to aparticular event type.

Appear either in field menus or event menus in search results. You can also set them up toonly appear in the menus of specific fields, or in all field menus in a qualifying event.

When selected, open either in the current window or in a new one.•

Define workflow actions using Splunk Manager

You can set up all of the workflow actions described in the bulleted list at the top of this chapter andmany more using Splunk Manager. To begin, go to the Manager page and click Fields. From thereyou can go to the Workflow actions page to review and update existing workflow actions. Or you canjust click Add new for workflow actions to create a new one. Both methods take you to the workflowaction detail page, which is where you define individual workflow actions.

If you're creating a new workflow action, you need to give it a Name and identify its Destination app.

There are three kinds of workflow actions that you can set up:

GET workflow actions, which create typical HTML links to do things like perform Googlesearches on specific values or run domain name queries against external WHOIS databases.

POST workflow actions, which generate an HTTP POST request to a specified URI. Thisaction type enables you to do things like create entries in external issue management systemsusing a set of relevant field values.

Search workflow actions, which launch secondary searches that use specific field valuesfrom an event, such as a search that looks for the occurrence of specific combinations ofipaddress and http_status' field values in your index over a specific time range.

77

Page 81: Splunk-4.1.7-Knowledge

Target workflow actions to a narrow grouping of events

When you create workflow actions in Manager, you can optionally target workflow actions to a narrowgrouping of events. You can restrict workflow action scope by field, by event type, or a combination ofthe two.

Narrow workflow action scope by field

You can set up workflow actions that only apply to events that have a specified field or set of fields.For example, if you have a field called http_status, and you would like a workflow action to applyonly to events containing that field, you would declare http_status in the Apply only to the followingfields setting.

If you want to have a workflow action apply only to events that have a set of fields, you can declare acomma-delimited list of fields in Apply only to the following fields. When more than one field islisted the workflow action is displayed only if the entire list of fields are present in the event.

For example, say you want a workflow action to only apply to events with ip_client andip_server fields. To do this, you would enter ip_client, ip_server in Apply only to the followingfields.

Workflow action field scoping also supports use of the wildcard asterisk. For example, if you declare asimple field listing of ip_* Splunk applies the resulting workflow action to events with eitherip_client or ip_server as well as a combination of both (as well as any other event with a fieldthat matches ip_*).

By default the field list is set to *, which means that it matches all fields.

If you need more complex selecting logic, we suggest you use event type scoping instead of fieldscoping, or combine event type scoping with field scoping.

Narrow workflow action scope by event type

Event type scoping works exactly the same way as field scoping. You can enter a single event type ora comma-delimited list of event type into the Apply only to the following event types setting tocreate a workflow action that Splunk only applies to events belonging to that event type or set ofevent types. You can also use wildcard matching to identify events belonging to a range of eventtypes.

You can also narrow the scope of workflow actions through a combination of fields and event types.For example, perhaps you have a field called http_status, but you only want the resultingworkflow action to appear in events containing that field if the http_status is greater than or equalto 500. To accomplish this you would first set up an event type called errors_in_500_range thatis applied to events matching a search like

http_status >= 500You would then define a workflow action that has Apply only to the following fields set tohttp_status and Apply only to the following event types set to errors_in_500_range.

For more information about event types, see "About event types" in this manual.

78

Page 82: Splunk-4.1.7-Knowledge

Control workflow action appearance in field and event menus

When workflow actions are set up correctly, they appear in dropdown menus associated with fieldsand events in your search results. For example, you can define a workflow action that sets off aGoogle search for values of the topic field in events. (The topic field turns up in webserver eventsassociated with the access of Splunk documentation topics. It has the name of a particular Splunkdocumentation topic as its value.)

Depending on how you define the Google search workflow action in Manager, you can have it appearin field menus for events containing a topic field:

Alternatively, you can have the workflow action appear in the event menus for those same events:

Or you can choose to have it appear in both the event menu and the field menus for eventscontaining a topic field.

Note that in the event depicted above, the topic field has a value of LicenseManagement. Themenus for this event display the workflow action Google LicenseManagement. Clicking on thisworkflow action sets off a Google search for the term LicenseManagement. This is an example of a"GET link" workflow action, and it's one of three kinds of workflow actions that you can implement inSplunk. Read on for instructions on setting up all three.

Set up a GET workflow action

GET link workflow actions drop one or more values into an HTML link. Clicking that link performs anHTTP GET request in a browser, allowing you to pass information to an external web resource, suchas a search engine or IP lookup service.

79

Page 83: Splunk-4.1.7-Knowledge

To define a GET workflow action, go to the detail page and set Action type to link, set Link methodto get. Then you define a Label and URI as appropriate.

Note: Variables passed in GET actions via URIs are automatically URL encoded during transmission.This means you can include values that have spaces between words or punctuation characters.However, if you're working with a field that has an HTTP address as its value, and you want to passthe entire field value as a URI, you should use the $! prefix to keep Splunk from escaping the fieldvalue. See "Use the $! prefix to prevent escape of URL or HTTP form field values" below for moreinformation.

Here's an example of the setup for a GET link workflow action that sets off a Google search onvalues of the topic field in search results:

The Label field enables you to define the text that is displayed in either the field or event workflowmenu. Labels can be static or include the value of relevant fields. For example, if you have a fieldcalled topic in your events and you want its value to be included in the label for a Google workflowaction, you might set the Label value to Google $topic$.

In the above example, if the value for topic in an event is CreatefieldactionsinSplunkWebthe field action displays as Google CreatefieldactionsinSplunkWeb in the topic field menu.

The URI field enables you to define the location of the external resource that you want to send yourfield values to. Similar to the Label setting, when you declare the value of a field, you use the nameof the field enclosed by dollar signs. In the above example, this URI uses the GET method to submit

80

Page 84: Splunk-4.1.7-Knowledge

the topic value to Google for a search.

You can choose whether the workflow action displays in the event menu, the field menu(s), or both.You can also identify whether the link opens in the current window or a new window.

You can also arrange for the workflow action to apply only to a specific set of events. You canindicate that the workflow action only appears in events that have a particular set of fields or whichbelong to a specific event type or set of event types.

Example - Provide an external IP lookup

You have configured configured your Splunk app to extract domain names in web services logs andspecify them as a field named domain. You want to be able to search an external WHOIS databasefor more information about the domains that appear.

Here's how you would set up the GET workflow action that helps you with this.

In the Workflow actions details page, set Action type to link and set Link method to get.

You then use the Label and URI fields to identify the field involved. Set a Label value of WHOIS:$domain$. Set a URI value of http://whois.net/whois/$domain$.

After that, you can determine:

whether the link shows up in the field menu, the event menu, or both.• whether the link opens the WHOIS search in the same window or a new one.• restrictions for the events that display the workflow action link. You can target the workflowaction to events that have specific fields, that belong to specific event types, or somecombination of the two.

Set up a POST workflow action

POST workflow actions are set up in a manner similar to that of GET link actions. Go to the workflowaction detail page and set Action type to link, set Link method to post, and define a Label and URIas appropriate.

However, POST requests are typically defined by a form element in HTML along with some inputsthat are converted into POST arguments. This means that you have to identify POST arguments tosend to the identified URI.

Note: Variables passed in POST link actions via URIs are automatically HTTP-form encoded duringtransmission. This means you can include values that have spaces between words or punctuationcharacters. However, if you're working with a field that has an HTTP address as its value, and youwant to pass the entire field value as a URI, you should use the $! prefix to keep Splunk fromescaping the field value. See "Use the $! prefix to prevent escape of URL or HTTP form field values"below for more information.

These arguments are key and value combinations that will be sent to a web resource that respondsto POST requests. On both the key and value sides of the argument, you can use field namesenclosed in dollar signs to identify the field value from your events that should be sent over to the

81

Page 85: Splunk-4.1.7-Knowledge

resource. You can define multiple key/value arguments in one POST workflow action.

Example - Allow an http error to create an entry in an issue tracking application

You've configured your Splunk app to extract HTTP status codes from a web service log as a fieldcalled http_status. Along with the http_status field the events typically contain either a normalsingle-line description request, or a multiline python stacktrace originating from the python processthat produced an error.

You want to design a workflow action that only appears for error events where http_status is inthe 500 range. You want the workflow action to send the associated python stacktrace and the HTTPstatus code to an external issue management system to generate a new bug report. However, theissue management system only accepts POST requests to a specific endpoint.

Here's how you might set up the POST workflow action that fits your requirements:

82

Page 86: Splunk-4.1.7-Knowledge

Note that the first POST argument sends server error $http_status$ to a title field in theexternal issue tracking system. If you select this workflow action for an event with an http_staus of500, then it opens an issue with the title server error 500 in the issue tracking system.

The second POST argument uses the _raw field to include the multiline python stacktrace in thedescription field of the new issue.

Finally, note that the workflow action has been set up so that it only applies to events belonging to theerrors_in_500_range event type. This is an event type that is only applied to events carryinghttp_error values in the typical HTTP error range of 500 or greater. Events with HTTP error codesbelow 500 do not display the submit error report workflow action in their event or field menus.

Set up a secondary search that is dynamically populated with field values from an event

To set up workflow actions that launch dynamically populated secondary searches, you start bysetting Action type to search on the Workflow actions detail page. This reveals a set of fields thatyou use to define the specifics of the search.

In Search string enter a search string that includes one or more placeholders for field values,bounded by dollar signs. For example, if you're setting up a workflow action that searches on client IPvalues that turn up in events, you might simply enter clientip=$clientip$ in that field.

Identify the app that the search runs in. If you want it to run in a view other than the current one,select that view. And as with all workflow actions, you can determine whether it opens in the currentwindow or a new one.

Be sure to set a time range for the search (or identify whether it should use the same time range asthe search that created the field listing) using . If left blank it runs over all time by default.

Finally, as with other workflow action types, you can restrict the the search workflow action to eventscontaining specific sets of fields and/or which belong to particular event types.

Example - Launch a secondary search that finds errors originating from a specific Ruby On Rails controller

Say your company uses a web infrastructure that is built on Ruby on Rails. You've set up an eventtype to sort out errors related to Ruby controllers (titled controller_error), but sometimes youjust want to see all the errors related to a particular controller. Here's how you might set up aworkflow action that does this:

1. On the Workflow actions detail page, set up an action with the following Label: See othererrors for controller $controller$ over past 24h.

2. Set Action type to Search.

3. Enter the following Search string: sourcetype=rails controller=$controller$error=*

4. Set an Earliest time of -24h. Leave Latest time blank.

83

Page 87: Splunk-4.1.7-Knowledge

5. Using the Apply only to the following... settings, arrange for the workflow action to only appear inevents that belong to the controller_error event type, and which contain the error andcontroller fields.

Those are the basics. You can also determine which app or view the workflow action should run in(for example, you might have a dedicated view for this information titled ruby_errors) and identifywhether the action works in the current window or opens a new one.

Use special parameters in workflow actions

Splunk provides special parameters for workflow actions that begin with an "@" sign. Two of thesespecial parameters are for field menus only. They enable you to set up workflow actions that apply toall fields in the events to which they apply.

@field_name - Refers to the name of the field being clicked on.• @field_value - Refers to the value of the field being clicked on.•

The other special parameters are:

@sid - Refers to the sid of the search job that returned the event• @offset - Refers to the offset of the event in the search job• @namespace - Refers to the namespace from which the search job was dispatched• @latest_time - Refers to the latest time the event occurred. It is used to distinguish similarevents from one another. It is not always available for all fields.

Example - Create a workflow action that applies to all fields in an event

You can update the Google search example discussed above (in the GET link workflow actionsection) so that it enables a search of the field name and field value for every field in an event towhich it applies. All you need to do is change the title to Google this field and value andreplace the URI of that action withhttp://www.google.com/search?q=$@field_name$+$@field_value$.

This results in a workflow action that searches on whichever field/value combination you're viewing afield menu for. If you're looking at the field menu for topic=WhatisSplunkknowledge and selectthe Google this field and value field action. the resulting Google search is topicWhatisSplunkknowledge.

Remember: Workflow actions using the @field_name and/or @field_value parameters are notcompatible with event-level menus.

Example - Show the source of an event

This workflow action uses the other special parameters to show the source of an event in your rawsearch data.

The Action type is link and its Link method is get. Its Title is Show source. The URI is/app/$@namespace$/show_source?sid=$@sid$&offset=$@offset$&latest_time=$@latest_time$.

84

Page 88: Splunk-4.1.7-Knowledge

It's targeted to events that have the following fields: _cd, source, host, index.

Try setting this workflow action up in your app (if it isn't installed already) and see how it works.

Use the $! prefix to prevent escape of URL or HTTP form field values

When you define fields to be used in workflow actions, it is often necessary to escape these fields sothey can be safely passed via HTTP to some other external endpoint. Sometimes this escaping maybe undesirable. In these cases, you can use the $! prefix to prevent Splunk from automaticallyescaping the field value. In the case of GET workflow actions, it prevents URL escape. In the case ofPOST workflow actions, it prevents HTTP form escape.

Example - Passing an HTTP address to a separate browser window

Say you have a GET workflow action that works with a field named http, which has fully formedHTTP addresses as values. This workflow action is designed to simply open a new browser windowpointing at the HTTP address value of the http field. This won't work if the new window is openedwith an escaped HTTP address. So you use the $! prefix. Where you might normally set the URIfield to $http$ for this workflow action in Manager, you instead set it to $!http$ to keep the HTTPaddress from escaping.

Configure workflow actions through workflow_actions.confConfigure workflow actions through workflow_actions.conf

This topic coming soon. In the meantime, learn how to set up and administrate workflow actions viaManager.

85

Page 89: Splunk-4.1.7-Knowledge

Data normalization: Tags and aliases

About tags and aliasesAbout tags and aliases

In your data, you might have groups of events with related field values. To help you search moreefficiently for these particular groups of event data, you can assign tags to their field values. You canassign one or more tags to any field/value combination (including event type, host, source, or sourcetype).

You can use tags to:

Help you track abstract field values, like IP addresses or ID numbers. For example, youcould have an IP address related to your main office with the value 192.168.1.2. Tag thatIPaddress value as mainoffice, and then search on that tag to find events with that IPaddress.

Use one tag to group a set of field values together, so you can search on them with onesimple command. For example, you might find that you have two host names that relate to thesame computer. You could give both of those values the same tag. When you search on thattag, Splunk returns events involving both host name values.

Give specific extracted fields multiple tags that reflect different aspects of their identity,which enable you to perform tag-based searches that help you quickly narrow down the resultsyou want. To understand how this could work, see the following example.

Example:

Let's say you have an extracted field called IPaddress, which refers to the IP addresses of the datasources within your company intranet. You can make IPaddress useful by tagging each IP addressbased on its functionality or location. You can tag all of your routers' IP addresses as router. You canalso tag each IP address based on its location, for example: SF or Building1. An IP address of arouter located in San Francisco inside Building 1 could have the tags router, SF, and Building1.

To search for all routers in San Francisco that are not in Building1, you'd search for the following:

tag=router tag=SF NOT (tag=Building1)

Define and manage tagsDefine and manage tags

Splunk provides a set of methods for tag creation and management. Most users will go with thesimplest method--tagging field/value pairs directly in search results, a method discussed in detail in"Tag and alias field values," in the User manual.

However, as a knowledge manager, you'll probably be using the Tags pages in Manager to curatethe various collections of tags created by users of your Splunk implementation. This topic explainshow to:

86

Page 90: Splunk-4.1.7-Knowledge

Use the tags pages in Manager to manage tags for your Splunk implementation.• Create new tags through Manager.• Disable or delete tags with Manager.•

Navigate to the Tags pages by selecting Manager > Tags.

Using the Tags pages in Manager

The Tags pages in Manager provide three views of the tags in your Splunk implementation:

Tags by field value pair(s), which you access by clicking List by field value pair(s) on theTags page.

List by tag name• Tags by unique ID, which you access by clicking All tag objects on the tags page.•

Each of these pages enables you to manage your tag collection in different ways. They enable you toquickly get a picture of the associations that have been made between tags and field/value pairs overtime. They also allow you to create and remove these associations.

Managing tag sets associated with specific field value pairs

What if you want to see a list of all of the field/value pairs in your system that have tags associatedwith them? Furthermore, what if you want to review and even update the set of tags that areassociated with a specific field/value pairing? Or define a set of tags for a particular field/value pair?

The Tags by field value pair(s) Manager page is the answer to these questions. It enables you toreview and edit the tag sets that have been associated with particular field/value pairs.

You can also use this page to manage the permissions around the ability to manage a particularfield/value combination with tags.

To see the list of tags for a specific field/value pair, locate that pairing and click on it in theField::Value column. This takes you to the detail page for the field/value pair.

Here's an example of a set of tags that have been defined for the eventtype=auditd_createfield/value pair:

87

Page 91: Splunk-4.1.7-Knowledge

You can add more tags, and delete them as well (if you have the permissions to do so).

When you click New on the Tags by field value pair(s) page, the system enables you to define a setof tags for a new field/value pair.

When you create or update a tag list for a field/value pairing, keep in mind that you may be creatingnew tags, or associating existing tags with a different kind of field/value pair than they were originallydesigned to work with. As a knowledge manager you should consider sticking to a carefully designedand maintained set of tags. This practice aids with data normalization, and can reduce confusion onthe part of your users. (For more information see the "Organize and administrate knowledge objects"chapter of this manual.)

Note: You may want to verify the existence of a field/value pair that you add to the Tags byfield/value pair(s) page. The system will not prevent you from defining a list of tags for a nonexistentfield/value pair.

Reviewing and updating sets of field value pairs associated with specific tags

What if you want to see a list of all of the tags in your system that have one or more tags associatedwith them? Furthermore, what if you want to review and even update the set of field/value pairingsthat are associated with a specific tag? Or define a set of field/value pairings for a new tag?

These questions are answered by the List by tag name Manager page. It enables you to review andedit the sets of field/value pairs that have been associated with specific tags.

This page does not allow you to manage permissions for the set of field/value pairs associated with atag, however.

To see the list of field/value pairings for a particular tag, locate the tag in the List by tag name, andclick on the tag name in the Tag column. This takes you to the detail page for the tag.

Here's an example displaying the various field/value pairings that the modify tag has beenassociated with.

88

Page 92: Splunk-4.1.7-Knowledge

You can add field/value associations, and delete them as well (if you have the permissions to do so).

When you click New on the List by tag name page, the system enables you to define a set offield/value pairings for a new tag.

When you create or update a set of field/value pairings for a tag, keep in mind that you may becreating new field/value pairings. You may want to verify the existence of field/value pairs that youassociate with a tag. The system will not prevent you from adding nonexistent field/valueassociations.

Be wary of creating new tags. Tags may already exist that serve the purpose you're trying to address.As a knowledge manager you should consider sticking to a carefully designed and maintained set oftags. This practice aids with data normalization, and can reduce confusion on the part of your users.(For more information see the "Organize and administrate knowledge objects" chapter of thismanual.)

Reviewing all unique field/value pair and tag combinations

The Tags by unique ID page breaks out all of the unique tag name and field/value pairings in yoursystem. Unlike the previous two pages, this page only lets you edit one-to-one relationships betweentags and field/value pairs.

You can search on a particular tag to quickly see all of the field/value pairs with which it's associated,or vice versa. This page is useful especially if you want to disable or clone a particular tag andfield/value association, or if you want to maintain permissions at that level of granularity.

Disabling and deleting tags

If you have a tag that you no longer want to use, or want to have associated with a particularfield/value pairing, you have the option of either disabling it or removing it. If you have thepermissions to do so, you can:

89

Page 93: Splunk-4.1.7-Knowledge

Remove a tag association for a specific field/value pair in the search results.• Bulk disable or delete a tag, even if it is associated to multiple field values, via the List by tagname page.

Bulk disable or delete the associations between a field/value pair and a set of tags via theTags by field value pair(s) page.

For information about deleting tag associations with specific field/value pairs in your search results,see "Tag and alias field values" in the User manual.

Delete a tag with multiple field/value pair associations

You can use Splunk Manager to completely remove a tag from your system, even if it is associatedwith dozens of field/value pairs. This method enables you to get rid of all of these associations in onestep.

Navigate to Manager > Tags > List by tag name. Delete the tag. If you don't see a delete link for thetag, you don't have permission to delete it. When you delete tags, try to be aware of downstreamdependencies by their removal. For more information, see "Curate Splunk knowledge with Manager"in this manual.

Note: You can also go into the edit view for a particular tag and delete a field/value pair associationdirectly.

Disable or delete the associations between a field/value pairing and a set of tags

Use this method to bulk-remove the set of tags that is associated to a field/value pair. This methodenables you to get rid of these associations in a single step. It does not remove the field/value pairingfrom your data, however.

Navigate to Manager > Tags > Tags by field value pair(s). Delete the field/value pair. If you don'tsee a delete link for the field/value pair, you don't have permission to delete it. When you delete theseassociations, try to be aware of downstream dependencies that may be adversely affected by theirremoval. For more information, see "Curate Splunk knowledge with Manager" in this manual.

Note: You can also go into the edit view for a particular field value and delete a tag associationdirectly.

Disable tags

Depending on your permissions to do so, you can also disable tag and field/value associations usingthe three Tags pages in Manager. When an association between a tag and a field/value pair isdisabled, it stays in the system but is inactive until it is enabled again.

Create aliases for fieldsCreate aliases for fields

You can create multiple aliases for a field. The original field is not removed. This process enables youto search for the original field using any of its aliases.

90

Page 94: Splunk-4.1.7-Knowledge

Important: Field aliasing is performed after key/value extraction but before field lookups. Therefore,you can specify a lookup table based on a field alias. This can be helpful if there are one or morefields in the lookup table that are identical to fields in your data, but have been named differently. Formore information read "Look up fields from external data sources" in this manual.

You can define aliases for fields that are extracted at index time as well as those that are extracted atsearch time.

You add your field aliases to props.conf, which you edit in $SPLUNK_HOME/etc/system/local/,or your own custom application directory in $SPLUNK_HOME/etc/apps/. (We recommend using thelatter directory if you want to make it easy to transfer your data customizations to other indexservers.)

Note: Splunk's field aliasing functionality does not currently support multivalue fields.

To alias fields:

1. Add the following line to a stanza in props.conf:

FIELDALIAS-<class> = (<orig_field_name> AS <new_field_name>)+

<orig_field_name> is the original name of the field.• <new_field_name> is the alias to assign to the field.• You can include multiple field alias renames in one stanza.•

2. Restart Splunk for your changes to take effect.

Example of field alias additions for a lookup

Say you're creating a lookup for an external static table CSV file where the field you've extracted atsearch time as "ip" is referred to as "ipaddress." In the props.conf file where you've defined theextraction, you would add a line that defines "ipaddress" as an alias for "ip," as follows:

[accesslog]EXTRACT-extract_ip = (?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})FIELDALIAS-extract_ip = ip AS ipaddress

When you set up the lookup in props.conf, you can just use ipaddress where you'd otherwisehave used ip:

[dns]lookup_ip = dnsLookup ipaddress OUTPUT host

For more information about search time field extraction, see "Add fields at search time" in thismanual.

For more information about field lookups, see "Create field lookups from external data sources" in thismanual.

91

Page 95: Splunk-4.1.7-Knowledge

Tag the host fieldTag the host field

Tagging the host field is useful for knowledge capture and sharing, and for crafting more precisesearches. You can tag the host field with one or more words. Use this to group hosts by function ortype, to enable users to easily search for all activity on a group of similar servers. If you've changedthe value of the host field for a given input, you can also tag events that are already in the index withthe new host name to make it easier to search across your data set.

Add a tag to the host field with Splunk Web

To add a tag to a host field/value combination in Splunk Web:

1.Perform a search for data from the host you'd like to tag.

2.In the search results, use the drop-down menu next to the host field value that you'd like to tag andchoose Tag host=<current host value>.

3. The Tag This Field dialog box appears. Enter your tag or tags, separated by commas or spaces,and click Ok.

Host names vs. tagging the host field

The value of the host field is set when an event is indexed. It can be set by default based on theSplunk server hostname, set for a given input, or extracted from each event's data. Tagging the hostfield with an alternate hostname doesn't change the actual value of the host field, but it lets yousearch for the tag you specified instead of having to use the host field value. Each event can haveonly one host name, but multiple host tags.

For example, if your Splunk server is receiving compliance data from a specific host, tagging that hostwith compliance will help your compliance searches. With host tags, you can create a loosegrouping of data without masking or changing the underlying host name.

You might also want to tag the host field with another host name if you indexed some data from aparticular input source and then decided to change the value of the host field for that input--all thenew data coming in from that input will have the new host field value, but the data that already existsin your index will have the old value. Tagging the host field for the existing data lets you search forthe new host value without excluding all the existing data.

92

Page 96: Splunk-4.1.7-Knowledge

Tag event typesTag event types

Tag event types to add information to your data. Any event type can have multiple tags. For example,you can tag all firewall event types as firewall, tag a subset of firewall event types as deny and taganother subset as allow. Once an event type is tagged, any event type matching the tagged patternwill also be tagged.

Note: You can tag an event type when you create it in Splunk Web or configure it in eventtypes.conf.

Add tags to event types using Manager

Splunk Manager enables you to view and edit lists of event types.

Click the Manager link in the upper right-hand corner.• Select Event types.• Locate the event type you want to tag and click on its name to go to its detail page.

Note: Keep in mind that event types are often associated with specific Splunk apps.They also have role-based permissions that can prevent you from seeing and/or editingthem.

♦ •

On the detail page for the event type, add or edit tags in the Tags field.• Click Save to confirm your changes.•

Once you have tagged an event type, you can search for it in the search bar with the syntaxtag::<field>=<tagname> or tag=<tagname>:

tag=footag::host=*local*

93

Page 97: Splunk-4.1.7-Knowledge

Manage your search knowledge

Manage saved searchesManage saved searches

Content coming soon

For a basic overview of saving searches and sharing them with others, see "Save searches andshare search results" in the User manual.

This topic will discuss saved searches from a knowledge management perspective, including the useof the Saved search page in Manager.

Configure the priority of scheduled searchesConfigure the priority of scheduled searches

This topic discusses the two options you can use to control the priority of concurrent scheduledsearches with the search scheduler. The options are real-time scheduling and continuous scheduling:

Real-time scheduling ensures that scheduled searches are always run over the most recenttime range, even when a number of searches are scheduled to run at approximately the sametime and the scheduler can only run one search concurrently. Because of the way it works,searches with real-time scheduling can end up skipping scheduled runs. However, they arealways given priority over searches with continuous scheduling.

Continuous scheduling ensures that each scheduled run of a search is eventuallyperformed, even if the result is that those searches are delayed. These settings are managedat the saved search level via savedsearches.conf. Splunk gives all scheduled searchesreal-time scheduling by default, but when a scheduled search is enabled for summaryindexing, Splunk automatically changes its scheduling option to continuous.

To understand the necessity of these two scheduler options, you need to understand how the searchscheduler handles concurrent searches.

For more information about scheduling saved searches, see "Schedule saved searches" in the Usermanual.

How the search scheduler handles concurrent searches

The Splunk search scheduler limits the number of scheduled searches that can be run concurrently.The default, set by the max_searches_perc setting in limits.conf, sets the maximum numberof concurrent searches that can be handled by the scheduler to 25% of themax_searches_per_cpu value. By default, max_searches_per_cpu is set to two searches forevery CPU in your system plus two. So if your system only has one CPU, the scheduler can only runone search at a time (1 = 25% of 4).

94

Page 98: Splunk-4.1.7-Knowledge

Note: We strongly recommend that you avoid changing limits.conf settings unless you knowwhat you are doing.

So, if your scheduler can only run one search at a time, but you have multiple searches scheduled torun on an hourly basis over the preceding hour's data, what happens? The scheduler lines thesearches up and runs them in consecutive order for the scheduled time period, but each searchreturns information for the time frame over which it was scheduled to run.

Example of real-time scheduling versus continuous scheduling

So, given how the scheduler works, how is real-time scheduling different from continuous scheduling,and under what conditions would you prefer one option over the other?

First, say you have two saved, scheduled searches that for the purpose of simplicity we'll call A andB:

Search A runs every minute and takes 30 seconds to complete• Search B runs every 5 minutes and takes 2 minutes to complete•

Let's also say that you have a Splunk configuration that enables the search scheduler to run only onesearch at a time.

Both searches are scheduled to run at 1:05pm.

Time Scheduler action

1:05:00pmThe scheduler runs A for the 1:04 to 1:05 period, and schedules it to run again at1:06pm. It is 1:05:30pm when search A completes.

1:05:30pmThe scheduler runs search B. Because it takes 2 minutes to run, search B won'tcomplete until 1:07:30.

1:06:00pmThe scheduler wakes up and attempts to run search A, but it cannot run becausesearch B is still in process.

1:06:59pmThe scheduler continues to attempt to run search A until 1:06:59. At this point whathappens next depends on whether search A is using real-time or continuousscheduling (see below).

If search A is configured to have:

real-time scheduling, the scheduler skips the 1:05-1:06 run of the search and schedules thenext run of search A for 1:07:00pm (for the 1:06 to 1:07 period). The new search run time isbased on the current scheduled run time (1:06:00pm).

continuous scheduling, the scheduler does not advance the schedule and attempts to runthe search for the 1:05 to 1:06pm period indefinitely, and whatever the eventual search runtime is, the next time period that search A would cover would be 1:06 to 1:07pm.

Real-time scheduling is the default for all scheduled searches. It's designed to ensure that the searchreturns current data. It assumes there won't be any problems if some scheduled searches are

95

Page 99: Splunk-4.1.7-Knowledge

skipped, as long as it returns up-to-the minute results in the most recent run of the search.

Continuous scheduling is used for situations where problems arise when there's any gap in thecollection of search data. In general this is only important for searches that populate summaryindexes, though you may find other uses for it. When a search is enabled for summary indexing,Splunk changes its scheduling option to continuous automatically.

Note: For more information about summary index searches, see "Use summary indexing forincreased reporting efficiency" in the Knowledge Manager manual.

Configure the realtime_schedule option

The system uses the realtime_schedule option in savedsearches.conf to determine the nextrun time of a scheduled search. This is set individually for each saved and scheduled search.

realtime_schedule= 0 | 1

Set realtime_schedule to 1 to use real-time scheduling. With this setting the schedulermakes sure that it is always running the search over the most recent time range. Becausesearches can't always run concurrently with others, this means that it may skip some searchperiods. This is the default value for a scheduled search.

Set realtime_schedule to 0 to use continuous scheduling. This setting ensures thatscheduled search periods are never skipped. Splunk automatically sets this value to 0 for anyscheduled search that is enabled for summary indexing.

The scheduler is designed to give searches with real-time scheduling priority over those withcontinuous scheduling; it always tries to run the real-time searches first.

Design macro searchesDesign macro searches

To simplify managing your searches, you can create saved searches that include macros, which areparametrized chunks of a search. These search macros can by any part of a search, such as an evalstatement or search term, and do not need to be a complete command. With macros, you can reusechunks of a search in multiple places, whether its a saved search or an ad hoc search. You can alsospecify whether or not the search macros take any or no arguments.

Note: Form searches also use search macros, but include a graphical user interface component.

Configure and manage search macros

You can view, edit, and create search macros using Splunk Web's Manager > Advanced Search >Search macros page and macros.conf. For more information, see "Create and use searchmacros" in the User Manual and the macros.conf reference in the Admin Manual.

Design form searches

96

Page 100: Splunk-4.1.7-Knowledge

Design form searches

Form searches are simplified search interfaces that help guide users in the creation of specific kindsof searches. They can include things like:

Open fields that take specific field values (such as user names or ID numbers) and can alsodisplay default values.

Dropdown lists containing dynamically defined collections of search terms.• Radio buttons that force the choice of particular field values (such as error codes like "404,""500," or "503").

Multiple result panels that take the values received from one form and plug them into varioushidden searches that in turn generate different kinds of charts and reports.

Form searches are created with XML code similar to that used for the construction of dashboards inSplunk. For more information, see the "Forms: an introduction" chapter of the Developer manual.

Define navigation to saved searches and reportsDefine navigation to saved searches and reports

As a knowledge manager you should ensure that your saved searches and reports appear in thetop-level navigation menus of your Splunk apps in a logical manner that facilitates ease of discovery.To do this you need to customize the navigation menus for your apps. If you fail to attend to yournavigation menus, over time they may become overlong, and inefficient, as saved searches andreports are added without subsequent categorization.

To manage the way your searches are saved and organized in the top-level navigation menu for anapp, you need to work with the code behind the nav menu. When you do this, keep in mind that thenav code refers to lists of searches and reports as collections.

The following subtopics describe various things you can do to organize your saved search andreports listings in the top-level navigation menu. For details on how to adjust the XML code for thenavigation menu, see "Build navigation for your app" in the Developer manual.

Set up a default collection

Each app should have a default collection set up for "unclassified" searches. Unclassified searchesare any searches that haven't been explicitly identified in the nav menu code. This is the collection inwhich all newly saved searches appear. In the Search app, for example, the default collection isSearches & Reports.

If you do not set up a default collection, you will have to manually add saved searches to the navcode to see them in your app's top-level navigation menu.

Note: A default collection should also be set up for unclassified views and dashboards.

97

Page 101: Splunk-4.1.7-Knowledge

Organize saved searches in nested collections

As the number of saved searches and reports that are created for an app grows, you're going to wantto find ways to organize those searches in a logical manner. You can manually construct collectionsthat group lists together by function. Going further, you can set up nested collections that subdividelarge collections into groups of smaller ones.

In the Search app, nested collections are used to group similar types of searches together:

Dynamically group together saved searches

Collections can be set up to dynamically group together saved searches that have matchingsubstrings in their names. For example, in the Search app example above, a nested collection groupstogether all uncategorized searches with the string "admin" in their titles.

There are two ways that saved searches can be dynamically grouped together with matchingsubstrings:

As a collection of uncategorized substring-matching searches, which means that the collectiononly displays searches that haven't been manually added to another collection.

As a collection of all substring-matching searches, which means that the collection displays allsearches with the matching substring whether or not they appear elsewhere in the navigationmenu.

Note: In both cases, only saved searches and reports that are available to the app with which thenavigation menu is associated are displayed.

98

Page 102: Splunk-4.1.7-Knowledge

Set up and use summary indexes

Use summary indexing for increased reporting efficiencyUse summary indexing for increased reporting efficiency

Splunk is capable of generating reports of massive amounts of data (100 million events andcounting). However, the amount of time it takes to compute such reports is directly proportional to thenumbers of events summarized. Plainly put, it can take a lot of time to search through very large datasets. If you only have to do this on an occasional basis, it may not be an issue. But running suchreports on a regular schedule can be impractical--and this impracticality only increases exponentiallyas more and more users in your organization use Splunk to run similar reports.

Use summary indexing to efficiently report on large volumes of data. With summary indexing, youset up a search that extracts the precise information you want, on a frequent basis. Each time Splunkruns this search it saves the results into a summary index that you designate. You can then runsearches and reports on this significantly smaller (and thus seemingly "faster") summary index. Andwhat's more, these reports will be statistically accurate because of the frequency of theindex-populating search (for example, if you want to manually run searches that cover the past sevendays, you might run them on a summary index that is updated on an hourly basis).

Summary indexing allows the cost of a computationally expensive report to be spread over time. Inthe example we've been discussing, the hourly search to populate the summary index with theprevious hour's worth of data would take a fraction of a minute. Generating the complete reportwithout the benefit of summary indexing would take approximately 168 (7 days * 24 hrs/day) timeslonger.

Perhaps an even more important advantage of summary indexing is its ability to amortize costs overdifferent reports, as well as for the same report over a different but overlapping time range. The samesummary data generated on a Tuesday can be used for a report of the previous 7 days done on theWednesday, Thursday, or the following Monday. It could also be used for a monthly report thatneeded the average response size per day.

Summary indexing use cases

Example #1 - Run reports over long time ranges for large datasets more efficiently: Imagineyou're using Splunk at a company that indexes tens of millions of events per day. You want to set upa dashboard for your employees that, among other things, displays a report that shows the number ofpage views and visitors each of your Web sites had over the past 30 days, broken out by site.

You could run this report on your primary data volume, but its runtime would be quite long, becauseSplunk has to sort through a huge number of events that are totally unrelated to web traffic in order toextact the desired data. But that's not all--the fact that the report is included in a popular dashboardmeans it'll be run frequently, and this could significantly extend its average runtime, leading to a lot offrustrated users.

99

Page 103: Splunk-4.1.7-Knowledge

But if you use summary indexing, you can set up a saved search that collects website page view andvisitor information into a designated summary index on a weekly, daily, or even hourly basis. You'llthen run your month-end report on this smaller summary index, and the report should complete farfaster than it would otherwise because it is searching on a smaller and better-focused dataset.

Example #2 - Building rolling reports: Say you want to run a report that shows a running count ofan aggregated statistic over a long period of time--a running count of downloads of a file from a Website you manage, for example.

First, schedule a saved search to return the total number of downloads over a specified slice of time.Then, use summary indexing to have Splunk save the results of that search into a summary index.You can then run a report any time you want on the data in the summary index to obtain the latestcount of the total number of downloads.

For another view, you can watch this Splunk developer video about the theory and practice ofsummary indexing.

Using the summary indexing reporting commands

If you are new to summary indexing, use the summary indexing reporting commands (sichart,sitimechart, sistats, sitop, and sirare) when you define the search that will populate the summaryindex. If you use these commands you can use the same search string that you use for the searchthat you eventually run on the summary index, with the exception that you use regular reportingcommands in the latter search.

Note: You do not have to use the si- summary index search commands if you are proficient with the"old-school" way of creating summary-index-populating searches. If you create summary indexesusing those methods and they work for you there's no need to update them. In fact, they may bemore efficient: there are performance impacts related to the use of the si- commands, because theycreate slightly larger indexes than the "manual" method does.

In most cases the impact is insignificant, but you may notice a difference if the summary indexes youare creating are themselves fairly large. You may also notice performance issues if you're setting upseveral searches to report against an index populated by an si- command search.

Defining index-populating searches without the special commands

In previous versions of Splunk you had to be very careful about how you designed the searches thatyou used to populate your summary index, especially if the search you wanted to run on the finishedsummary index involved aggregate statistics, because it meant that you had to carefully set up the"index-populating" search in a way that did not provide incorrect results. For example, if you wantedto run a search on the finished summary index that gave you average response times broken out byserver, you'd want to set up a summary-index-populating search that:

is scheduled to run on a more frequent basis than the search you plan to run against thesummary index

samples a larger amount of data than the search you plan to run against the summary index.•

100

Page 104: Splunk-4.1.7-Knowledge

contains additional search commands that ensure that the index-populating search isgenerating a weighted average.

The summary index reporting commands take care of the last two points for you--they automaticallydetermine the adjustments that need to be made so that your summary index is populated with datathat does not produce statistically inaccurate results. However, you still should arrange for thesummary-index-populating search to run on a more frequent basis than the search that you later runagainst the summary index.

If you would like more information about setting up summary-index-populating searches that do notuse the special summary index reporting commands, see "Configure summary indexes" in theKnowledge Management manual.

Summary indexing reporting command usage example

Let's say you've been running the following search, with a time range of the past year:

eventtype=firewall | top src_ipThis search gives you the top source ips for the past year, but it takes forever to run because it scansacross your entire index each time.

What you need to do is create a summary index that is composed of the top source IPs from the"firewall" event type. You can use the following search to build that summary index. You wouldschedule it to run on a daily basis, collecting the top src_ip values for only the previous 24 hourseach time. The results of each daily search are added to an index named "summary":

eventtype=firewall | sitop src_ipNote: Summary-index-populating searches are statistically more accurate if you schedule them to runand sample information on a more frequent basis than the searches you plan to run against thefinished summary index. So in this example, because we plan to run searches that cover a timespanof a year, we set up a summary-index-populating search that samples information on a daily basis.

Important: When you define summary-index-populating searches, do not pipe other searchoperators after the main summary indexing reporting command. In other words, don't includeadditional | eval commands and the like. Save the extra search operators for the searches you runagainst the summary indexes, not the search you use to populate it.

Important: The results from a summary-indexing optimized search are stored in a special format thatcannot be modified before the final transformation is performed. This means that if you populate asummary index with ... | sistats <args>, the only valid retrieval of the data is:index=<summary> source=<saved search name> | stats <args>. The search againstthe summary index cannot create or modify fields before the | stats <args> command.

Now, let's say you save this search with the name "Summary - firewall top src_ip" (all savedsummary-index-populating searches should have names that identify them as such). After yoursummary index is populated with results, search and report against that summary index using asearch that specifies the summary index and the name of the search that you used to populate it. Forexample, this is the search you would use to get the top source_ips over the past year:

101

Page 105: Splunk-4.1.7-Knowledge

index=summary search_name="summary - firewall top src_ip" |top src_ipBecause this search specifies the search name, it filters out other data that have been placed in thesummary index by other summary indexing searches. This search should run fairly quickly, even ifthe time range is a year or more.

Note: If you are running a search against a summary index that queries for events with a specificsourcetype value, be aware that you need to use orig_sourcetype instead. So instead ofrunning a search against a summary index like ...|stats timechart avg(ip) bysourcetype, use ...|stats timechart avg(ip) by orig_sourcetype.

Why do you have to do this? When events are gathered into a summary index, Splunk changes theirsourcetype values to "stash" and moves the original sourcetype values to orig_sourcetype.

Setting up summary index searches in Splunk Web

You can set up summary index searches through the Splunk Web interface. Summary indexing is analert option for saved, scheduled searches. Once you determine the search that you want to use topopulate a summary index, follow these steps:

1. Go to the Search details page for the search, either by clicking Save search in the Search orReport Builder interface, or through the Searches and Reports page in Manager by selecting thename of a previously saved search or clicking New.

2. Select Schedule this search if the search isn't already scheduled. Schedule the search to run onan appropriate interval. Remember that searches that populate summary indexes should run on afairly frequent basis in order to create statistically accurate final reports. If the search you're runningagainst the summary index is gathering information for the past week, you should have the summarysearch run on an hourly basis, collecting information for each hour. If you're running searches overthe past year's worth of data, you might have the summary index collect data on a daily basis for thepast day.

Note: Be sure to schedule the search so that there are no data gaps and overlaps. For more on thissee the subtopic on this issue, below.

102

Page 106: Splunk-4.1.7-Knowledge

3. Under Alert conditions, select a Perform actions value of always.

4. Under Alert actions, select Enable summary indexing.

5. Enter the name of the summary index that the search will be populating. The Summary index is thedefault summary index. You may need to create additional summary indexes if you plan to run avariety of summary index searches. For information about creating new indexes, see "Set up multipleindexes" in the Admin manual. It's a good idea to create indexes that are dedicated to the collectionof summary data.

Note: If you enter the name of an index that does not exist, Splunk will run the search on theschedule you've defined, its data will not get saved to a summary index.

6. (Optional) Under Add fields, you can add field/value pairs to the summary index definition. Thesekey/value pairs will be annotated to each event that gets summary indexed, making it easier to findthem with later searches. For example, you could add the name of the saved search populating thesummary index (report=summary_firewall_top_src_ip) or the name of the index that the searchpopulates (index=summary), and then search on those terms later.

Note: You can also add field/value pairs to the summary index configuration insavedsearches.conf. For more information, see "Configure summary indexes" in the KnowledgeManager manual.

For more information about saving, scheduling, and setting up alerts for searches, see "Savesearches and share search results", "Schedule saved searches", and "Set alert conditions forscheduled searches", in this manual.

103

Page 107: Splunk-4.1.7-Knowledge

Schedule the populating search to avoid data gaps and overlaps

To minimize data gaps and overlaps you should be sure to set appropriate intervals and delays in theschedules of searches you use to populate summary indices.

Gaps in a summary index are periods of time when a summary index fails to index events. Gaps canoccur if:

the summary-index-populating search takes too long to run and runs past the next scheduledrun time. For example, if you were to schedule the search that populates the summary to runevery 5 minutes when that search typically takes around 7 minutes to run, you would haveproblems, because the search won't run again when it's still running a preceding search.

splunkd goes down.•

Overlaps are events in a summary index (from the same search) that share the same timestamp.Overlapping events skew reports and statistics created from summary indexes. Overlaps can occur ifyou set the time range of a saved search to be longer than the frequency of the schedule of thesearch. In other words, don't arrange for an hourly search to gather data for the past 90 minutes.

Note: If you think you have gaps or overlaps in your summary index data, Splunk provides methodsof detecting them and either backfilling them (in the case of gaps) or deleting the overlapping events.For more information, see "Manage summary index gaps and overlaps" in the Knowledge Managermanual.

How summary indexing works

In Splunk Web, summary indexing is an alert option for scheduled saved searches. When you run asaved search with summary indexing turned on, its search results are temporarily stored in a file($SPLUNK_HOME/var/spool/splunk/<savedsearch_name>_<random-number>.stash).From the file, Splunk uses the addinfo command to add general information about the current searchand the fields you specify during configuration to each result. Splunk then indexes the resulting eventdata in in the summary index that you've designated for it (index=summary by default).

Note: Use the addinfo command to add fields containing general information about the currentsearch to the search results going into a summary index. General information added about the searchhelps you run reports on results you place in a summary index.

Summary indexing of data without timestamps

To set the time for summary index events, Splunk uses the following information, in this order ofprecedence:

1. The _time value of the event being summarized

2. The earliest (or minimum) time of the search

104

Page 108: Splunk-4.1.7-Knowledge

3. The current system time (in the case of an "all time" search, where no "earliest" value is specified)

In the majority of cases, your events will have timestamps, so the first method of discerning thesummary index timestamp holds. But if you are summarizing data that doesn't contain an _time field(such as data from a lookup), the resulting events will have the timestamp of the earliest time of thesearch.

For example, if you summarize the lookup "asset_table" every night at midnight, and the asset tabledoes not contain an _time column, tonight's summary will have an _time value equal to the earliesttime of the search. If I have set the time range of the search to be between -24h and +0s, eachsummarized event will have an _time value of now()-86400 (that's the start time of the searchminus 86,400 seconds, or 24 hours). This means that every event without an _time field value that isfound by this summary-index-populating search will be given the exact same _time value: thesearch's earliest time.

The best practice for summarizing data without a time stamp is to manually create an _time value aspart of your search. Following on from the example above:

|inputlookup asset_table | eval _time=now()

Manage summary index gaps and overlapsManage summary index gaps and overlaps

The accuracy of your summary index searches can be compromised if the summary indexes involvedhave gaps or overlaps in their collected data.

Gaps in summary index data can come about for a number of reasons:

A summary index initially only contains events from the point that you start datacollection: Don't lose sight of the fact that summary indexes won't have data from before thesummary index collection start date--unless you arrange to put it in there yourself with thebackfill script.

splunkd outages: If splunkd goes down for a significant amount of time, there's a goodchance you'll get gaps in your summary index data, depending on when the searches thatpopulate the index are scheduled to run.

Searches that run longer than their scheduled intervals: If the search you're using topopulate the scheduled search runs longer than the interval that you've scheduled it to run on,then you're likely to end up with gaps because Splunk won't run a scheduled search againwhen a preceding search is still running. For example, if you were to schedule theindex-populating search to run every five minutes, you'll have a gap in the index data collectionif the search ever takes more than five minutes to run.

Overlaps are events in a summary index (from the same index-populating search) that share thesame timestamp. Overlapping events skew reports and statistics created from summary indexes.Overlaps can occur if you set the time range of a saved search to be longer than the scheduledsearch interval. In other words, don't arrange for an hourly search to gather data for the past 90minutes.

105

Page 109: Splunk-4.1.7-Knowledge

Note: For general information about creating and maintaining summary indexes, see "Use summaryindexing for increased reporting efficiency" in the Knowledge Manager manual.

Use the backfill script to add other data or fill summary index gaps

The fill_summary_index.py script backfills gaps in summary index collection by running thesaved searches that populate the summary index as they would have been executed at their regularlyscheduled times for a given time range. In other words, even though your new summary index onlystarted collecting data at the start of this week, if necessary you can use fill_summary_index.pyto fill the summary index with data from the past month.

In addition, when you run fill_summary_index.py you can specify an App and schedule backfillactions for a list of summary index searches associated with that App, or simply choose to backfill allsaved searches associated with the App.

When you enter the fill_summary_index.py commands through the CLI, you must provide thebackfill time range by indicating an "earliest time" and "latest time" for the backfill operation. You canindicate the precise times either by using relative time identifiers (such as -3d@d for "3 days ago atmidnight") or by using UTC epoch numbers. The script automatically computes the times during thisrange when the summary index search would have been run.

NOTE: To ensure that the fill_summary_index.py script only executes summary index searchesat times that correspond to missing data, you must use -dedup true when you invoke it.

The fill_summary_index.py script requires that you provide necessary authentication(username and password). If you know the valid Splunk key when you invoke the script, you can passit in via the -sk option.

The script is designed to prompt you for any required information that you fail to provide in thecommand line, including the names of the summary index searches, the authentication information,and the time range.

Examples of fill_summary_index.py invocation

If this is your situation:

You need to backfill all of the summary index searches for the splunkdotcom App for the pastmonth--but you also need to skip any searches that already have data in the summary index:

Then you'd enter this into the CLI:

./splunk cmd python fill_summary_index.py -app splunkdotcom -name "*" -et-mon@mon -lt @mon -dedup true -auth admin:changeme

If this is your situation:

You need to backfill the my_daily_search summary index search for the past year, running nomore than 8 concurrent searches at any given time (to reduce impact on Splunk performance while

106

Page 110: Splunk-4.1.7-Knowledge

the system collects the backfill data). You do not want the script to skip searches that already havedata in the summary index. The my_daily_search summary index search is owned by the "admin"role.

Then you'd enter this into the CLI:

./splunk cmd python fill_summary_index.py -app search -namemy_daily_search -et -y -lt now -j 8 -owner admin -auth admin:changeme

Note: You need to specify the -owner option for searches that are owned by a specific user or role.

What to do if fill_summary_index.py is interrupted while running

In the app that you are invoking fill_summary_index.py from (default: 'search'), there will be a 'log'directory. In this directory, there will be an empty temp file named 'fsidx*lock'.

Delete the 'fsidx*lock' file and you will be able to restart fill_summary_index.py.

fill_summary_index.py usage and commands

In the CLI, start by entering:

python fill_summary_index.py

...and add the required and optional fields from the table below.

Note: <boolean> options accept the values 1, t, true, or yes for "true" and 0, f, false, or no for"false."

Field Value

-et <string> Earliest time (required). Either a UTC time or a relative time string.

-lt <string> Latest time (required). Either a UTC time or a relative time string.

-app <string> The application context to use (defaults to None).

-name <string>Specify a single saved search name. Can specify multiple times to providemultiple names. Use the wildcard symbol ("*") to specify all enabled,scheduled saved searches that have a summary index action.

-names <string> Specify a comma seperated list of saved search names.

-namefile<filename>

Specify a file with a list of saved search names, one per line. Lines beginningwith a # are considered comments and ignored.

-owner <string> The user context to use (defaults to "None").

-index <string>Identifies the summary index that the saved search populates. If the index isnot provided, the backfill script tries to determine it automatically. If thisattempt at auto index detection fails, the index defaults to "summary".

-auth <string> The authentication string expects either <username> or<username>:<password>. If only a username is provided, the script

107

Page 111: Splunk-4.1.7-Knowledge

requests the password interactively.

-sleep <float> Number of seconds to sleep between each search. Default is 5 seconds.

-j <int> Maximum number of concurrent searches to run (default is 1).

-dedup<boolean>

When this option is set to true, the script doesn't run saved searches for ascheduled time if data already exists in the summary index. If this option isnot used, its default is false.

-showprogress<boolean>

When this option is set to true, the script periodically shows the doneprogress for each currently running search that it spawns. If this option isunused, its default is false

Advanced options: these should not be used in almost all cases

-trigger<boolean>

When this option is set to false, the script runs each search but does nottrigger the summary indexing action. If this option is unused its default is true.

-dedupsearch<string>

Indicates the search to be used to determine if data corresponding to aparticular saved search at a specific scheduled times is present

-namefield<string>

Indicates the field in the summary index data that contains the name of thesaved search that generated that data.

-timefield<string>

Indicates the field in the summary index data that contains the scheduled timeof the saved search that generated that data

Use the overlap command to identify summary index gaps and overlaps

To identify gaps and overlaps in your data, run a search against the summary index that uses theoverlap command. This command identifies ranges of time in the index that include gaps or overlaps.If you suspect that a particular time range might include gaps and/or overlaps, you can identify it inthe search by specifying a start time and end time or a period and a saved search name, followingthe | overlap command in the search string.

Use these two commands to define a specific calendar time range:

StartTime: Time to start searching for missing entries, starttime=mm/dd/yyyy:hh:mm:ss (for example: 05/20/2008:00:00:00).

EndTime: Time to stop searching for missing entries, endtime= mm/dd/yyyy:hh:mm:ss(for example: 05/22/2008:00:00:00).

Or use these two commands to define a period of time and the saved search to search for missingevents with:

Period: Specify the length of time period to search, period=<integer>[smhd] (forexample: 5m).

SavedSearchName: Specify the name of the saved search to search for missing events withsavedsearchname=string (NO wildcards).

If you identify a gap, you can run your scheduled saved search over the period of the gap andsummary index the results with the backfill script (see below).

108

Page 112: Splunk-4.1.7-Knowledge

If you identify overlapping events, you can manually delete the overlaps from the summary index byusing the search language.

Configure summary indexesConfigure summary indexes

For a general overview of summary indexing and instructions for setting up summary indexingthrough Splunk Web, see the topic "Use summary indexing for increased reporting efficiency" in theKnowledge Manager manual.

You can't manually configure a summary index for a search in savedsearches.conf until thesearch is saved, scheduled, and has the Enable summary indexing alert option is selected.

In addition, you need to enter the name of the summary index search that the search will populate.You do this through the saved search dialog after selecting Enable summary indexing. TheSummary index is the default summary index (the index that Splunk uses if you do not indicateanother one).

If you plan to run a variety of summary index searches you may need to create addtional summaryindexes. For information about creating new indexes, see "Set up multiple indexes" in the Adminmanual. It's a good idea to create indexes that are dedicated to the collection of summary data.

Note: If you enter the name of an index that does not exist, Splunk will run the search on theschedule you've defined, but its data will not get saved to a summary index.

For more information about saving, scheduling, and setting up alerts for searches, see "Savesearches and share search results," "Schedule saved searches", and "Set alert conditions forscheduled searches", in the User manual.

Note: When you define the search that you'll use to build your index, most of the time you should usethe summary indexing reporting commands in the search that you use to build your summary index.These commands are prefixed with "si-": sichart, sitimechart, sistats, sitop, and sirare. The searchesyou create with them should be versions of the search that you'll eventually use to query thecompleted summary index.

The summary index reporting commands automatically take into account the issues that are coveredin "Considerations for summary index search definition" below, such as scheduling shorter timeranges for the populating search, and setting the populating search to take a larger sample. You onlyhave to worry about these issues if the search that you are using to build your index does not includesummary index reporting commands.

If you do not use the summary index reporting commands, you can use the addinfo and collectsearch commands to create a search that Splunk saves and schedules, and which populates apre-created summary index. For more information about that method, see "Manually populate thesummary index" in this topic.

109

Page 113: Splunk-4.1.7-Knowledge

Customize summary indexing for a saved, scheduled search

When you use Splunk Web to enable summary indexing for a saved, scheduled,summary-index-enabled search, Splunk automatically generates a stanza in$SPLUNK_HOME/etc/system/local/savedsearches.conf. You can customize summaryindexing for the search by editing this stanza.

If you've used Splunk Web to save and schedule a search, but haven't used Splunk Web to enablethe summary index for the search, you can easily enable summary indexing for the saved searchthrough savedsearches.conf as long as you have a new index for it to populate. For moreinformation about manual index configuration, see, see the topic "About managing indexes" in theAdmin manual.

[ <name> ]action.summary_index = 0 | 1action.summary_index._name = <index>action.summary_index.<field> = <value>

[<name>]: Splunk names the stanza based on the name of the saved and scheduled searchthat you enabled for summary indexing.

action.summary_index = 0 | 1: Set to 1 to enable summary indexing. Set to 0 todisable summary indexing.

action.summary_index._name = <index> - This displays the name of the summaryindex populated by the search. If you've created a specific summary index for this search,enter its name in <index>. Defaults to summary, the summary index that is delivered withSplunk.

action.summary_index.<field> = <value>: Specify a field/value pair to add to everyevent that gets summary indexed by this search. You can define multiple field/value pairs for asingle summary index search.

This field/value pair acts as a "tag" of sorts that makes it easier for you to identify the events that gointo the summary index when you are performing searches amongst the greater population of eventdata. This key is optional but we recommend that you never set up a summary index without at leastone field/value pair.

For example, add the name of the saved search that is populating the summary index(action.summary_index.report = summary_firewall_top_src_ip), or the name of theindex that the search populates (action.summary_index.index = search).

Search commands useful to summary indexing

Summary indexing utilizes of a set of specialized reporting commands which you need to use if youare manually creating your summary indexes without the help of the Splunk Web interface or thesummary indexing reporting commands.

addinfo: Summary indexing uses addinfo to to add fields containing general informationabout the current search to the search results going into a summary index. Add | addinfo toany search to see what results will look like if they are indexed into a summary index.

collect: Summary indexing uses collect to index search results into the summary index. Use| collect to index any search results into another index (using collect command

110

Page 114: Splunk-4.1.7-Knowledge

options).overlap: Use overlap to identify gaps and overlaps in a summary index. overlap finds eventsof the same query_id in a summary index with overlapping timestamp values or identifiesperiods of time where there are missing events.

Manually configure a search to populate a summary index

If you want to configure summary indexing without using the search options dialog in Splunk Web andthe summary indexing reporting commands, you must first configure a summary index just like youwould any other index via indexes.conf. For more information about manual index configuration,see, see the topic "About managing indexes" in this manual.

Important: You must restart Splunk for changes in indexes.conf to take effect.

1. Run a search that you want to summarize results from in the Splunk Web search bar.

Be sure to limit the time range of your search. The number of results that your searchgenerates needs to fit within the maximum search result limits you have set for searching.

Make sure to choose a time interval that works for your data, such as 10 minutes, 2 hours, or 1day. (For more information about setting intervals in Splunk Web, see "Scheduling savedsearches" in the User Manual.)

2. Use the addinfo search command. Append | addinfo to the end of your search.

This command adds information about the search to events that the collect command requiresin order to place them into a summary index.

You can always add | addinfo to any search to preview what the results of a search willlook like in a summary index.

3. Add the collect search command. Append |collect index=<index_name> addtime=tmarker="info_search_name=\"<summary_search_name>\"" to the end of the search.

Replace index_name with the name of the summary index• Replace summary_search_name with a key to find the results of this search in the index.• A summary_search_name *must* be set if you wish to use the overlap search command onthe generated events.

Note: For the general case we recommend that you use the provided summary_index alert action.Configuring via addinfo and collect requires some redundant steps that are not needed when yougenerate summary index events from scheduled searches. Manual configuration remains necessarywhen you backfill a summary index for timeranges which have already transpired.

Considerations for summary index search definition

If for some reason you're going to set up a summary-index-populating search that does not use thesummary indexing reporting commands, you should take a few moments to plan out your approach.With summary indexing, the egg comes before the chicken. Use the search that you actually want toreport on to help define the search you use to populate the summary index.

111

Page 115: Splunk-4.1.7-Knowledge

Many summary searches involve aggregated statistics--for example, a report where you aresearching for the top 10 ip addresses associated with firewall offenses over the past day--when themain index accrues millions of events per day.

If you populate the summary index with the results of the same search that you run on the summaryindex, you'll likely get results that are statistically inaccurate. You should follow these rules whendefining the search that populates your summary index to improve the accuracy of aggregatedstatistics generated from summary index searches.

Schedule a shorter time range for the populating search

The search that populates your summary index should be scheduled on a shorter (and thereforemore frequent) interval than that of the search that you eventually run against the index. You shouldgo for the smallest time range possible. For example, if you need to generate a daily "top" report,then the report populating the summary index should take its sample on an hourly basis.

Set the populating search to take a larger sample

The search populating the summary index should seek out a significantly larger sample than thesearch that you want to run on the summary index. So, for example, if you plan to search thesummary index for the daily top 10 offending ip addresses, you would set up a search topopulate the summary index with the hourly top 100 offending ip addresses.

This approach has two benefits--it ensures a higher amount of statistical accuracy for the top 10report (due to the larger and more-frequently-taken overall sample) and it gives you a bit of wiggleroom if you decide you'd rather report on the top 20 or 30 offending ips.

The summary indexing reporting commands automatically take a sample that is larger than thesearch that you'll run to query the completed summary index, thus creating summary indexes withevent data that is not incorrectly skewed. If you do not use those commands, you can use the headcommand to to select a larger sample for the summary-index-populating search than the search thatyou run on the summary index. In other words, you would have | head=100 for the hourly summaryindex populating search, and | head=10 for the daily search of the completed summary index.

Set up your search to get a weighted average

If your summary-index-populating search involves averages, and you are not using the summaryindexing reporting commands, you need to set that search up to get a weighted average.

For example, say you want to build hourly, daily, or weekly reports of average response times. To dothis, you'd generate the "daily average" by averaging the "hourly averages" together. Unfortunately,the daily average becomes skewed if there aren't the same number of events in each "hourlyaverage". You can get the correct "daily average" by using a weighted average function.

The following expression calculates the the daily average response time correctly with a weightedaverage by using the stats and eval commands in conjunction with the sum statistical aggregator.In this example, the eval command creates a daily_average field, which is the result of dividingthe average response time sum by the average response time count.

112

Page 116: Splunk-4.1.7-Knowledge

| stats sum(hourly_resp_time_sum) as resp_time_sum,sum(hourly_resp_time_count) as resp_time_count | eval daily_average=resp_time_sum/resp_time_count | .....Schedule the populating search to avoid data gaps and overlaps

Along with the above two rules, to minimize data gaps and overlaps you should also be sure to setappropriate intervals and delays in the schedules of searches you use to populate summary indexes.

Gaps in a summary index are periods of time when a summary index fails to index events. Gaps canoccur if:

splunkd goes down.• the scheduled saved search (the one being summary indexed) takes too long to run and runspast the next scheduled run time. For example, if you were to schedule the search thatpopulates the summary to run every 5 minutes when that search typically takes around 7minutes to run, you would have problems, because the search won't run again when it's stillrunning a preceding search.

Overlaps are events in a summary index (from the same search) that share the same timestamp.Overlapping events skew reports and statistics created from summary indexes. Overlaps can occur ifyou set the time range of a saved search to be longer than the frequency of the schedule of thesearch, or if you manually run summary indexing using the collect command.

Example of a summary index configuration

This example shows a configuration for a summary index of Web statistics as it might appear insavedsearches.conf. The keys listed below enable summary indexing for the saved search"MonthlyWebstatsReport", and append the field Webstatsreport with a value of 2008 to everyevent going into the summary index.

#name of the saved search = Apache Method Summary[Apache Method Summary]# sets the search to run at each search intervalcounttype = always# enable the search scheduleenableSched = 1# search interval in cron notation (this means "every 5 minutes")schedule = */12****# id of user for saved searchuserid = jsmith# search string for summary indexsearch = index=apache_raw startminutesago=30 endminutesago=25 | extract auto=false | stats count by method# enable summary indexingaction.summary_index = 1#name of summary index to which search results are addedaction.summary_index._name = summary # add these keys to each eventaction.summary_index.report = "count by method"

113

Page 117: Splunk-4.1.7-Knowledge

Other configuration files affected by summary indexing

In addition to the settings you configure in savedsearches.conf, there are also settings for summaryindexing in indexes.conf and alert_actions.conf.

Indexes.conf specifies index configuration for the summary index. Alert_actions.conf controls the alertactions (including summary indexing) associated with saved searches.

Caution: Do not edit settings in alert_actions.conf without explicit instructions from Splunkstaff.

114