andrea: indexing

What is this fishbucket thing?

It’s time for a little Indexing 101. If you look in the directory where your Splunk datastore resides (default location /opt/splunk/var/lib/splunk) you will find a directory called fishbucket. This index is not really intended for normal humans to investigate, more just Splunk engineers trying to decipher file input issues. It contains seek pointers and CRCs for the files you are indexing, so splunkd can tell if it has read them already. To see what’s there, try searching for “index=_thefishbucket”. Events look something like this:

48a304b3 initcrc::5f66db978a1ff3a3 seekcrc::bc96de428cc0b5e6 seekptr::414063 modtime::1218643123 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log

The fields are:

timestamp (epoch time, in hex)
CRC of the first 256 bytes of the file
CRC of the 256 bytes where we were last reading
seek pointer for where we are in the file
the time the file last changed
the full path to the file.
the full path to the source, which is usually the same as the file but could be the archive the file came from.