andrea: metrics

What is it doing?

Up here in SupportLand, I get a lot of questions about how to understand the various bits of information that Splunk itself is tracking. The past couple of versions have added several new things to make it easier to see what is going on. Here are some of the things you can look at.

audit.log

New in 3.2, the audit.log records who did what based on what capability was requested from the authorization system. It shows both user-initiated actions like login and automated actions like running saved searches.

Login
07-14-2008 10:59:09.434 INFO AuditLogger - Audit:[timestamp=Mon Jul 14 10:59:09 2008, user=admin, action=login attempt, info=succeeded][n/a]

Running a script
07-14-2008 10:59:12.542 INFO AuditLogger - Audit:[timestamp=Mon Jul 14 10:59:12 2008, user=admin, action=run_script_sendemail, info=granted ][n/a]

Dispatch search
07-14-2008 14:43:39.619 INFO AuditLogger - Audit:[timestamp=Mon Jul 14 14:43:39 2008, user=admin, action=search, info=granted dispatch maxtime=0 maxresults=100 [search sudo | eval sizeof=length(host) ] | outputcsv][n/a]

REST request
07-15-2008 08:21:33.576 INFO AuditLogger - Audit:[timestamp=Tue Jul 15 08:21:33 2008, user=admin, action=search, info=granted REST: /search/jobs][n/a]

license_audit.log

Digging into metrics.log

Occasionally people ask for help in identifying a rogue data input that is suddenly spewing events. If it’s hidden in a ton of similar data it can be difficult to sort out which one is actually the problem. One place to look is the Splunk internal metrics.log. You can find it by searching the internal index (add “index=_internal” to your search) or just look in the file itself (located in $SPLUNK_HOME/var/log/splunk.)

Before I get into what can be found there, I need to explain what metrics.log is not. It is a sampling over 30 second intervals, so it will not give you an exact accounting of all your inputs. For each type of item reported, you get the top ten hot sources over the interval, based on the size of the event (_raw.) It is different from the numbers reported by LicenseManager, which include the indexed fields. Also, the default configuration only maintains the metrics data in the internal index a few days, but by going to the files you can see trends over a period of months if your rolled files go that far back.

A typical metrics.log has stuff like this: