40 Days of 4.0: Enriching Data with Lookups (Part 1)

Many customers tell me that they see a lot of value when Splunk is used to enrich IT data with information from another source.  An example of such an enrichment could be a cross reference between a customer’s username found in an application log and that same customer’s information extracted from a contact management system.  How amazing would it be to have a customer service representative make a phone call to Mr. Smith to ask if he needed help logging onto their system after a number of failed logins?

Splunk has always been able to do data enrichment, but the newly released Splunk 4 really simplifies the process.  In this post, I’ll give a quick examply of using a CSV file to provide data enrichment to a application log.  In future posts, I’ll show how to use an external database as the data source.

Let’s start with some mock application data.  To keep things simple, we’ll use this as our application log:
Jul 27 08:35:09 appname=app4 error=123
Jul 27 08:35:19 appname=app3 error=123
Jul 27 08:35:29 appname=app1 error=163
Jul 27 08:35:39 appname=app1 error=123
Jul 27 08:35:49 appname=app1 error=133
Jul 27 08:35:59 appname=app1 error=123
Jul 27 08:36:09 appname=app1 error=123

The Commoditization of the IT Professional (or is there a new Black Art?)

A recent gathering of friends (a group of IT gray-hairs, artists, and lawyers) had got me thinking about IT as a profession, and the development of the industry since I got involved 20 years ago. The question posed to the group was about whether we would recommend our current professions to our children. This query, a few others, and perhaps one Liberty Ale too many had started me down the track of over-analyzing the state of IT today. I suppose I am both proud and terrified at the same time.

First, the goodness. As an industry participant, IT has come a long way. Collectively, we have successfully lobbied to become more than just a cost center. The ‘nerds in the back room’ have become intertwined with the business. IT now facilitates both cost savings and revenue generation. IT is the driving force and enabler of employee empowerment, productive mobility, and instantaneous communication. IT run systems facilitate negotiations, analyze deals and execute trades.

Well done, everyone. A big pat on the back to us all.

Before I start sharing the negatives, I should let you know that I still have faith in the future of IT. Skip to the end if you don’t care for the doom and gloom.

Field Definitions and Splunk’s extract Command

The 3.0 version of Splunk has introduced some wonderful new features such as advanced reporting, granular access control and a slew of additional functions to help you search through your IT data. One of these newly released functions is the extract command. This works very nicely with Splunk’s revamped facility to add, view, and access field names. Here is a quick primer on creating field definitions and using the extract command to have those definitions reloaded automatically.

Splunk has always done a great job at allowing you to search on any text from any data source. Splunk even goes one step beyond this and automatically defines named fields data that shows up in a Keyword = Value (KV) pair. If my data contains text that looks like

username=sparky

then Splunk will key in on those values, allowing me to search and report more precisely on those values. For instance I could say

* | where username <> “sparky”

to get back all of the records where sparky did not show up as a username.

But what if my data is not so friendly? Consider an event that looks like this:

Invalid login attempt by sparky on host kinja