andrea: Archive for August, 2008

More fishbucket fun

For debugging files getting re-indexed, sometimes what I want to see can only be found in the fishbucket index of the affected instance. I can pick up and move an entire index (3.x+) and drop it into another instance, but when working with the fishbucket there are a couple other things to watch out for. I don’t want anything to change it once I put it in the new instance. So I set up a throwaway instance to easily make changes I wouldn’t want to do to a real one.

REALLY BIG WARNING

Don’t do this to any Splunk instance you like. You will be unhappy later. Throw away your dummy instance when you are done so you don’t confuse anybody.

Set up a new instance of an appropriate version, the same or more recent as the original and appropriate architecture (ppc/sparc or intel.) Get it all working with the correct ports so you don’t conflict with anything else that may be running on the machine. Since it won’t be indexing, the license doesn’t matter. Start and then stop so the first run stuff is done.

What is this fishbucket thing?

It’s time for a little Indexing 101. If you look in the directory where your Splunk datastore resides (default location /opt/splunk/var/lib/splunk) you will find a directory called fishbucket. This index is not really intended for normal humans to investigate, more just Splunk engineers trying to decipher file input issues. It contains seek pointers and CRCs for the files you are indexing, so splunkd can tell if it has read them already. To see what’s there, try searching for “index=_thefishbucket”. Events look something like this:

48a304b3 initcrc::5f66db978a1ff3a3 seekcrc::bc96de428cc0b5e6 seekptr::414063 modtime::1218643123 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log

The fields are:

timestamp (epoch time, in hex)
CRC of the first 256 bytes of the file
CRC of the 256 bytes where we were last reading
seek pointer for where we are in the file
the time the file last changed
the full path to the file.
the full path to the source, which is usually the same as the file but could be the archive the file came from.