inputcsv to restrict a search by a list of field values
| Topics: | tech |
|---|---|
| Tags: | |
| Share: |
A customer asked about a complicated search that could be vastly simplified by using inputcsv to input a list of values from a file, a feature added for 3.3.x. It’s documented as an internal search command here:
http://www.splunk.com/doc/latest/user/UnsupportedCommands#inputcsv
We are talking about promoting it to public, so while it says unsupported it does work. Here’s how:
I’ve got events from my webserver for my new domain and I want to see what real hits it’s getting and not my own. They look like this:
66.249.70.86 - - [23/Oct/2008:01:42:21 -0700] “GET /category/admin/ HTTP/1.1″ 200 5158 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
And I’ve gotten some traffic already:
$ ./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log | stats count'
count
-----
11424
It’s a standard format that was automatically recognized as sourcetype access_common, so the extracted field “clientip” is already there. I create a csv file containing the values I want to exclude like this:
clientip
xxx.xxx.xxx.xxx
yyy.yyy.yyy.yyy
zzz.zzz.zzz.zzz
This file needs to exist relative to $SPLUNK_HOME/var/run/splunk, so to avoid specifying a path in my search I’ll just put it there. Note that I could also have used xxx.xxx.xxx.* if I wanted to, wildcards are ok.
Now I can do this search:
./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log NOT [inputcsv mycsvfile.csv]‘
$ ./splunk dispatch 'source=/var/log/apache2/myghettodatacenter_access_log NOT [inputcsv mycsvfile.csv] | stats count’
count
—–
121
and only get the ones that aren’t from my network. This search also works from the UI as
source="/var/log/apache2/mynewdomain_access_log" NOT [inputcsv mycsvfile.csv]

July 9th, 2009 at 5:58 am
So two questions on this.
1. the “mycsvfile.csv” has the form of
clientip,
xxx.xxx.xxx.xxx,
yyy.yyy.yyy.yyy,
zzz.zzz.zzz.zzz,
or
clientip
xxx.xxx.xxx.xxx
yyy.yyy.yyy.yyy
zzz.zzz.zzz.zzz
or
clientip,xxx.xxx.xxx.xxx,yyy.yyy.yyy.yyy,zzz.zzz.zzz.zzz
2. Can this be use instead of an exclusion search a inclusion search
i.e. source=”/var/log/apache2/mynewdomain_access_log” [inputcsv mycsvfile.csv]
would that search work?
July 9th, 2009 at 1:23 pm
There’s no comma there after clientip, and each value on one line. I haven’t specifically tested using it to include things (basically as a whitelist) but offhand I don’t see why it wouldn’t work. It’s a valid subsearch and you can NOT or not, as you prefer.
n.b. I haven’t tried this on 4.0, although I have no reason to expect it wouldn’t still work.