andrea: tech

40 Days of 4.0: So you want to write an app

With the previous setup, here’s what I want for my app:

A dashboard with a couple pretty pictures and some top N lists
Saved searches for advanced users to explore further
It should work for all my users with whatever indexes they have access to

I’m going to start with the sample_app template available in Manager and add what I want. Then I’ll clean up the sample stuff I don’t need. So the first step is to create a new app in Manager->Apps. Give it a name and an optional label and select “sample_app” as the template. I don’t have any additional files to upload now, so I’ll leave that alone. Save and I’m back to the list of installed apps.

On the filesystem, a bunch of things just happened. The directory MyGreatApp was created, containing a complete app structure and sample files, enough to have a functioning app. These files are all based on simplified XML that hides much of the complexity of the underlying full XML format. This makes it easier to build views, but has limitations. (For more on this see the docs: Simple Dashboards)

Some highlights:

List indexes on the main dashboard

If you are comfortable editing XML, here’s a handy hack to get the list of your default indexes in the “All indexed data” dashboard. It will show whatever the logged-in user has access to.
If you are using the standard dashboards from the Search app, do this:

Go to $SPLUNK_HOME/etc/apps/search/default/data/ui/views
Copy dashboard.xml to $SPLUNK_HOME/etc/apps/search/local/data/ui/views
Change the permissions on the file so you can edit it
Right before the last </view> tag at the end insert this XML:

 <module name="HiddenSearch" layoutPanel="panel_row2_col1_grp4" group="All
indexed data" autoRun="True">
    <param name="search">| eventcount summarize=false index=* -count</param>
    <module name="SimpleResultsHeader">
      <param name="entityName">results</param>
      <param name="headerFormat">Indexes (%(count)s)</param>
      <module name="Paginator">
	<param name="count">20</param>
	<param name="entityName">results</param>
	<param name="maxPages">10</param>
	<module name="LinkList">
          <param name="initialSortDir">desc</param>
          <param name="labelFieldSearch">*</param>
          <param name="valueField">count</param>
          <param name="labelField">index</param>
          <param name="labelFieldTarget">flashtimeline</param>
          <param name="initialSort">count</param>
	</module>
      </module>
    </module>
  </module>

Save the file.
Back in the UI, click the Splunk logo to refresh the search app.

Presto! Now there is a new column showing indexes. If something didn’t work right, just remove the file you created. This file won’t be overwritten on upgrade, so if in the future there is a change to the search app you will still have this version because files in local take precedence.

Getting started with 4.0 apps

I’ve been working on some apps for 4.0 and finally I can talk details. Over the next couple posts I’ll walk though creating a simple app using the new UI tools and a little XML. This is all based off the Apache logs on my server, so first a little background on how I’ve configured my 4.0 instance.

I have a typical small server whose primary purpose is to host a dozen or so low traffic websites. One site gets half my hits, three more most of the rest and the stragglers round out the lot attracting bots. Each virtual host has separate access_log and error_log files but all use the same format: access_common.

To take advantage of the new multi-index search in Splunk 4, I’ve set up my instance to use different indexes for various sources. In my case, it’s by person, as I have several groups of sites managed by a particular admin. The indexes are named www_something so as the overall administrator I can search across all of them with “index=www_*” and still not have to touch the other system events I’ve got going into the main index. I have also set up roles so each admin sees only the relevant data (and isn’t confused by the rest.) All the config is explained in the docs, so I won’t go over it right now.

inputcsv to restrict a search by a list of field values

A customer asked about a complicated search that could be vastly simplified by using inputcsv to input a list of values from a file, a feature added for 3.3.x. It’s documented as an internal search command here:

http://www.splunk.com/doc/latest/user/UnsupportedCommands#inputcsv

We are talking about promoting it to public, so while it says unsupported it does work. Here’s how:

I’ve got events from my webserver for my new domain and I want to see what real hits it’s getting and not my own. They look like this:


66.249.70.86 - - [23/Oct/2008:01:42:21 -0700] “GET /category/admin/ HTTP/1.1″ 200 5158 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”

And I’ve gotten some traffic already:


$ ./splunk dispatch 'source=/var/log/apache2/mynewdomain_access_log | stats count'
count
-----
11424

It’s a standard format that was automatically recognized as sourcetype access_common, so the extracted field “clientip” is already there. I create a csv file containing the values I want to exclude like this:


clientip
xxx.xxx.xxx.xxx
yyy.yyy.yyy.yyy
zzz.zzz.zzz.zzz

This file needs to exist relative to $SPLUNK_HOME/var/run/splunk, so to avoid specifying a path in my search I’ll just put it there. Note that I could also have used xxx.xxx.xxx.* if I wanted to, wildcards are ok.

Enabling debug messages

Splunk spits out an astounding number of its own internal log messages, some I’ve already described. This post is how to get more of them, in case you have spare disk space lying around and need something to fill it with. Or you have some problem with Splunk and need debug logs. Sometimes Support will ask for this to diagnose an issue.

splunkd log messages go in the file splunkd.log. (Note that if you move the existing file out of the way, a fresh one is created on startup if you want to work with only the messages from the current run.) They are controlled by the log.cfg file located in /opt/splunk/etc, which specifies the log level of messages by category:

rootCategory=WARN,A1
category.LicenseManager=INFO
category.TcpOutputProc=INFO
category.TcpInputProc=INFO
category.UDPInputProcessor=INFO

Messages can be set to, in order of severity: DEBUG, INFO, WARN, FATAL, CRIT. Setting a log level gets you messages at that level and higher, so default settings are typically INFO or WARN. When you change something in this file, you need to restart Splunk for it to take effect. When you restart with the –debug flag, it uses a similar file, log-debug.cfg, with a different set of settings for DEBUG messages. Not everything is set to DEBUG, because some of the categories are very chatty.

Index ICU: Assertion `_sourceMetaData != __null’ failed, part 1

There you were, merrily going along and Boom! Somebody kicks the power switch, your filesystem goes off the deep end, something Very Bad happens. You start to understand why fsck is a four-letter word. After using some additional four-words, you get things up and running. But what’s with Splunk? It won’t start!? You only get some cryptic error and “Splunkd appears too be down.” Welcome to the world of WordData. You had a backup, right? Yeah, thought so.

Buried deep in the index are a bunch of *.data files:

www.feorlen.org[feorlen]:/Applications/splunk/var/lib/splunk/defaultdb/db$ ls -lr *.data
-rw-r–r– 1 root admin 10276 Sep 3 07:41 Sources.data
-rw-r–r– 1 root admin 5085 Sep 3 07:41 SourceTypes.data
-rw-r–r– 1 root admin 252 Sep 3 07:41 Hosts.data
-rw-r–r– 1 root admin 21 Jul 26 19:19 EventTypes.data

You will find them in every bucket, they contain event counts for sources, sources, hosts and event types along with some timerange info. During indexing, these are constantly being updated. They are supposed to look something like this (note my timestamping oops there for host::grumpy):

$ more Hosts.data
0 0 2147483647 0 0
1 host::grumpy 11194556 900458000 1231448496 1220453014
2 host::www 1953184 1194131619 1220452994 1220452994
3 host::www.feorlen.org 2350 1207761050 1216665145 1216665145
4 host::localhost 7482 1203904810 1217973661 1217973661

More fishbucket fun

For debugging files getting re-indexed, sometimes what I want to see can only be found in the fishbucket index of the affected instance. I can pick up and move an entire index (3.x+) and drop it into another instance, but when working with the fishbucket there are a couple other things to watch out for. I don’t want anything to change it once I put it in the new instance. So I set up a throwaway instance to easily make changes I wouldn’t want to do to a real one.

REALLY BIG WARNING

Don’t do this to any Splunk instance you like. You will be unhappy later. Throw away your dummy instance when you are done so you don’t confuse anybody.

Set up a new instance of an appropriate version, the same or more recent as the original and appropriate architecture (ppc/sparc or intel.) Get it all working with the correct ports so you don’t conflict with anything else that may be running on the machine. Since it won’t be indexing, the license doesn’t matter. Start and then stop so the first run stuff is done.

What is this fishbucket thing?

It’s time for a little Indexing 101. If you look in the directory where your Splunk datastore resides (default location /opt/splunk/var/lib/splunk) you will find a directory called fishbucket. This index is not really intended for normal humans to investigate, more just Splunk engineers trying to decipher file input issues. It contains seek pointers and CRCs for the files you are indexing, so splunkd can tell if it has read them already. To see what’s there, try searching for “index=_thefishbucket”. Events look something like this:

48a304b3 initcrc::5f66db978a1ff3a3 seekcrc::bc96de428cc0b5e6 seekptr::414063 modtime::1218643123 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log

The fields are:

timestamp (epoch time, in hex)
CRC of the first 256 bytes of the file
CRC of the 256 bytes where we were last reading
seek pointer for where we are in the file
the time the file last changed
the full path to the file.
the full path to the source, which is usually the same as the file but could be the archive the file came from.

Forcing dashboard refresh

In 3.2.x and 3.3.x, dashboards refresh automatically on their own schedule: 10% of the time period or 1 hour, whichever is sooner. You can’t change this right now. But if you want to force a refresh, you can delete the files that contain the cached data.

Dashboards create username_* files in $SPLUNK_HOME/var/run/splunk to persist the dashboard data. There is also a directory for each username with *.csv files. Delete the username_* files (like “admin_KB indexed per hour last 24 hours”) and the *.csv files and the next time you refresh the dashboard, it will reload.

This is not an elegant solution by any means, but it does work. While you could just delete the files for the search in question, there is no simple way to identify which csv file is associated with it. Just don’t go messing with the other files in this directory, you will be Very Unhappy if you do.

Talk to Splunk from WordPress

I wrote a WordPress plugin (tested for 2.5.1) that displays my most recent Google search terms in my sidebar. It was an experiment with using the Splunk REST API and the PHP SDK.

You can configure the widget from the Widgets page and it supports multiple instances with different configuration. Right now the actual search string is hardcoded because I’m doing some extra mangling to get the search terms the way I want anyway, but I’ll be adding that to the configuration options also. Eventually there will be a way to cache results so you don’t do the search each time the page is loaded.

Since there is still work to do to make it more generic, I haven’t uploaded it to the WordPress site. But here is the basic PHP code to play around with. In fine programming tradition, I learned quite a lot by picking apart existing WordPress widgets, in this case Random Image and Twitter Tools. This widget requires the Splunk PHP SDK, by default my code is expecting it to be in the same directory (which is probably going to be something like wp/wp-content/plugins/widgetname.) There are a few things it depends on, you can find the details at the Google Code page.