andrea: tech

More fishbucket fun

For debugging files getting re-indexed, sometimes what I want to see can only be found in the fishbucket index of the affected instance. I can pick up and move an entire index (3.x+) and drop it into another instance, but when working with the fishbucket there are a couple other things to watch out for. I don’t want anything to change it once I put it in the new instance. So I set up a throwaway instance to easily make changes I wouldn’t want to do to a real one.

REALLY BIG WARNING

Don’t do this to any Splunk instance you like. You will be unhappy later. Throw away your dummy instance when you are done so you don’t confuse anybody.

Set up a new instance of an appropriate version, the same or more recent as the original and appropriate architecture (ppc/sparc or intel.) Get it all working with the correct ports so you don’t conflict with anything else that may be running on the machine. Since it won’t be indexing, the license doesn’t matter. Start and then stop so the first run stuff is done.

What is this fishbucket thing?

It’s time for a little Indexing 101. If you look in the directory where your Splunk datastore resides (default location /opt/splunk/var/lib/splunk) you will find a directory called fishbucket. This index is not really intended for normal humans to investigate, more just Splunk engineers trying to decipher file input issues. It contains seek pointers and CRCs for the files you are indexing, so splunkd can tell if it has read them already. To see what’s there, try searching for “index=_thefishbucket”. Events look something like this:

48a304b3 initcrc::5f66db978a1ff3a3 seekcrc::bc96de428cc0b5e6 seekptr::414063 modtime::1218643123 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log

The fields are:

timestamp (epoch time, in hex)
CRC of the first 256 bytes of the file
CRC of the 256 bytes where we were last reading
seek pointer for where we are in the file
the time the file last changed
the full path to the file.
the full path to the source, which is usually the same as the file but could be the archive the file came from.

Forcing dashboard refresh

In 3.2.x and 3.3.x, dashboards refresh automatically on their own schedule: 10% of the time period or 1 hour, whichever is sooner. You can’t change this right now. But if you want to force a refresh, you can delete the files that contain the cached data.

Dashboards create username_* files in $SPLUNK_HOME/var/run/splunk to persist the dashboard data. There is also a directory for each username with *.csv files. Delete the username_* files (like “admin_KB indexed per hour last 24 hours”) and the *.csv files and the next time you refresh the dashboard, it will reload.

This is not an elegant solution by any means, but it does work. While you could just delete the files for the search in question, there is no simple way to identify which csv file is associated with it. Just don’t go messing with the other files in this directory, you will be Very Unhappy if you do.

Talk to Splunk from WordPress

I wrote a WordPress plugin (tested for 2.5.1) that displays my most recent Google search terms in my sidebar. It was an experiment with using the Splunk REST API and the PHP SDK.

You can configure the widget from the Widgets page and it supports multiple instances with different configuration. Right now the actual search string is hardcoded because I’m doing some extra mangling to get the search terms the way I want anyway, but I’ll be adding that to the configuration options also. Eventually there will be a way to cache results so you don’t do the search each time the page is loaded.

Since there is still work to do to make it more generic, I haven’t uploaded it to the WordPress site. But here is the basic PHP code to play around with. In fine programming tradition, I learned quite a lot by picking apart existing WordPress widgets, in this case Random Image and Twitter Tools. This widget requires the Splunk PHP SDK, by default my code is expecting it to be in the same directory (which is probably going to be something like wp/wp-content/plugins/widgetname.) There are a few things it depends on, you can find the details at the Google Code page.

What is it doing?

Up here in SupportLand, I get a lot of questions about how to understand the various bits of information that Splunk itself is tracking. The past couple of versions have added several new things to make it easier to see what is going on. Here are some of the things you can look at.

audit.log

New in 3.2, the audit.log records who did what based on what capability was requested from the authorization system. It shows both user-initiated actions like login and automated actions like running saved searches.

Login
07-14-2008 10:59:09.434 INFO AuditLogger - Audit:[timestamp=Mon Jul 14 10:59:09 2008, user=admin, action=login attempt, info=succeeded][n/a]

Running a script
07-14-2008 10:59:12.542 INFO AuditLogger - Audit:[timestamp=Mon Jul 14 10:59:12 2008, user=admin, action=run_script_sendemail, info=granted ][n/a]

Dispatch search
07-14-2008 14:43:39.619 INFO AuditLogger - Audit:[timestamp=Mon Jul 14 14:43:39 2008, user=admin, action=search, info=granted dispatch maxtime=0 maxresults=100 [search sudo | eval sizeof=length(host) ] | outputcsv][n/a]

REST request
07-15-2008 08:21:33.576 INFO AuditLogger - Audit:[timestamp=Tue Jul 15 08:21:33 2008, user=admin, action=search, info=granted REST: /search/jobs][n/a]

license_audit.log

overriding default syslog host extraction

I had a customer recently ask how to change the host that was applied to a particular set of incoming events. Normally this wouldn’t be a big deal, just specify the new name in inputs.conf. But this is from syslog. When you set one of the syslog sourcetypes there is some extra processing to extract the correct hostname which overrides other settings. And the hostname in the event is wrong.

So to get the right one, I set up this transform to force it to a specified value. And still give it my correct syslog sourcetype.
My inputs.conf is tailing an entire directory, which for sake of demonstration I’m going to pretend is all syslog.

$ more inputs.conf
host = support09.splunk.com
[tail:///var/log]
disabled = false
host = support09.splunk.com
sourcetype = syslog

props.conf is specifying a transform only for the source of interest:

$ more props.conf
[source::/var/log/system.log]
# note: overriding default syslog transform!
TRANSFORMS = feorlenhost

and transforms.conf is defining what to do to it. I have to specify a REGEX, but I’m not actually using it so I’ll just say ‘.’ to match everything. The FORMAT line is what is going to set my host:

Digging into metrics.log

Occasionally people ask for help in identifying a rogue data input that is suddenly spewing events. If it’s hidden in a ton of similar data it can be difficult to sort out which one is actually the problem. One place to look is the Splunk internal metrics.log. You can find it by searching the internal index (add “index=_internal” to your search) or just look in the file itself (located in $SPLUNK_HOME/var/log/splunk.)

Before I get into what can be found there, I need to explain what metrics.log is not. It is a sampling over 30 second intervals, so it will not give you an exact accounting of all your inputs. For each type of item reported, you get the top ten hot sources over the interval, based on the size of the event (_raw.) It is different from the numbers reported by LicenseManager, which include the indexed fields. Also, the default configuration only maintains the metrics data in the internal index a few days, but by going to the files you can see trends over a period of months if your rolled files go that far back.

A typical metrics.log has stuff like this:

conf files, part 2

Here are a couple more of my conf files explained. First the simple one:

server.conf

[sslConfig]
enableSplunkSearchSSL = true

All this says is that I’m using SSL on the front end. I clicky clicky the nice UI control and it magically happens. There could be a pile of other stuff in here, like specifying real paid-money-for certs if I were using any. But I’m not. Self-signed works for me, even if it means my users get whiny messages from their browsers. Whatever.

access_controls.conf

[roles]
apache2 = source::/var/log/apache2

[groups]
hosted_user = apache2

[users]
user1 = hosted_user

I added some access controls to help out one of my novice users, somebody who maintains the content on several sites but isn’t a big sysadmin. I set up a role that only allows access to the apache logs and assign it to the group hosted_user, which is then specified for user1. I thought about giving her access to just the files she needs, but that would mean specifying them each individually, either in multiple roles or one role with a bunch of OR terms in a single role.

conf file 101, part 1

I’m going over some stuff for the new support engineers, so I thought it would be useful to put it in a blog post. As an example of what you can do with conf files, I’ve got the changes I make to my own configuration and why. This is more focused on 3.1.x rather than preview, but I’m basically using the same configuration in both so far. For public consumption, I’ve changed some names but otherwise this is the contents of my conf files.

This first post is about inputs.conf, props.conf and transforms.conf, the basics of event handling.

inputs.conf

host = myhost

[tail:///Library/Logs/CrashReporter]
disabled = false
sourcetype = crashreporter

[tail:///Library/Logs/MySQL.log]
disabled = false

[tail:///Library/Logs/Software Update.log]
disabled = false

[tail:///Library/Logs/DirectoryService]
disabled = false

[tail:///var/log]
disabled = false

I added the tail on /var/log from the UI but the rest of this I did by hand. That wasn’t strictly necessary, but it was easier for me to add a couple stanzas at once that way. “host = myhost” is setting the name of my machine so everything has the correct hostname even if something in the actual event might make it get set to something else. (syslog type events are the usual offender for me, even if I’m not actually getting syslog from another host. Some tend to show up as “www” if I’m not paying attention.) CrashReporter, MySQL.log, Software Update.log and DirectoryService are things specifically in /Library/Logs that I wanted. I needed to set the sourcetype manually for crashreporter, so I just listed the others while I was at it.

getting my existing index into preview

Preview is out the door, woohoo! So up here in support I’m busy with the existing versions so I hadn’t checked out many of the new features. I wanted to mess with real data I care about, so I figured I’d copy my existing index and drop it into my splunkpreview directory. I host a handful of domains at home (on Leopard Server) and I’m using Splunk to watch various things I want to know, like who’s commenting on my blog and how many dictionary attacks I’ve had today. I thought it would be nifty to look at the same data in both 3.1.3 (my current production version) and preview.

The first time I tried it, I thought I’d be clever and set it all up before first startup with my whole index, users, saved searches and basically everything. Because, well, I clone this stuff all the time between 3.1.x versions when I’m setting up repro environments for customer issues. Wrong! Not sure what I forgot, but for my efforts I got a nice big segfault. Well, nothing a little rm won’t fix.