inputcsv to restrict a search by a list of field values

A customer asked about a complicated search that could be vastly simplified by using inputcsv to input a list of values from a file, a feature added for 3.3.x. It’s documented as an internal search command here:

http://www.splunk.com/doc/latest/user/UnsupportedCommands#inputcsv

We are talking about promoting it to public, so while it says unsupported it does work. Here’s how:

I’ve got events from my webserver for my new domain and I want to see what real hits it’s getting and not my own. They look like this:


66.249.70.86 - - [23/Oct/2008:01:42:21 -0700] “GET /category/admin/ HTTP/1.1″ 200 5158 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”

And I’ve gotten some traffic already:


$ ./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log | stats count'
count
-----
11424

It’s a standard format that was automatically recognized as sourcetype access_common, so the extracted field “clientip” is already there. I create a csv file containing the values I want to exclude like this:


clientip
xxx.xxx.xxx.xxx
yyy.yyy.yyy.yyy
zzz.zzz.zzz.zzz

This file needs to exist relative to $SPLUNK_HOME/var/run/splunk, so to avoid specifying a path in my search I’ll just put it there. Note that I could also have used xxx.xxx.xxx.* if I wanted to, wildcards are ok.

Now I can do this search:


./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log NOT [inputcsv mycsvfile.csv]‘


$ ./splunk dispatch 'source=/var/log/apache2/myghettodatacenter_access_log NOT [inputcsv mycsvfile.csv] | stats count’
count
—–
121

and only get the ones that aren’t from my network. This search also works from the UI as


source="/var/log/apache2/mynewdomain_access_log" NOT [inputcsv mycsvfile.csv]

2 Responses to “inputcsv to restrict a search by a list of field values”

  1. Eric S Says:

    So two questions on this.
    1. the “mycsvfile.csv” has the form of
    clientip,
    xxx.xxx.xxx.xxx,
    yyy.yyy.yyy.yyy,
    zzz.zzz.zzz.zzz,
    or
    clientip
    xxx.xxx.xxx.xxx
    yyy.yyy.yyy.yyy
    zzz.zzz.zzz.zzz
    or
    clientip,xxx.xxx.xxx.xxx,yyy.yyy.yyy.yyy,zzz.zzz.zzz.zzz

    2. Can this be use instead of an exclusion search a inclusion search
    i.e. source=”/var/log/apache2/mynewdomain_access_log” [inputcsv mycsvfile.csv]
    would that search work?

  2. andrea Says:

    There’s no comma there after clientip, and each value on one line. I haven’t specifically tested using it to include things (basically as a whitelist) but offhand I don’t see why it wouldn’t work. It’s a valid subsearch and you can NOT or not, as you prefer.

    n.b. I haven’t tried this on 4.0, although I have no reason to expect it wouldn’t still work.

Leave a Reply