<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>andrea</title>
	<atom:link href="http://blogs.splunk.com/andrea/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.splunk.com/andrea</link>
	<description>Just another WordPress weblog</description>
	<pubDate>Wed, 27 Aug 2008 21:10:57 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
			<item>
		<title>More fishbucket fun</title>
		<link>http://blogs.splunk.com/andrea/2008/08/27/more-fishbucket-fun/</link>
		<comments>http://blogs.splunk.com/andrea/2008/08/27/more-fishbucket-fun/#comments</comments>
		<pubDate>Wed, 27 Aug 2008 21:10:57 +0000</pubDate>
		<dc:creator>andrea</dc:creator>
		
		<category><![CDATA[hacks]]></category>

		<category><![CDATA[fishbucket]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/andrea/?p=403</guid>
		<description><![CDATA[For debugging files getting re-indexed, sometimes what I want to see can only be found in the fishbucket index of the affected instance. I can pick up and move an entire index (3.x+) and drop it into another instance, but when working with the fishbucket there are a couple other things to watch out for. [...]]]></description>
			<content:encoded><![CDATA[<p>For debugging files getting re-indexed, sometimes what I want to see can only be found in the fishbucket index of the affected instance. I can pick up and move an entire index (3.x+) and drop it into another instance, but when working with the fishbucket there are a couple other things to watch out for. I don&#8217;t want anything to change it once I put it in the new instance. So I set up a throwaway instance to easily make changes I wouldn&#8217;t want to do to a real one. </p>
<p><em><strong>REALLY BIG WARNING</strong></em></p>
<p><em>Don&#8217;t do this to any Splunk instance you like. You will be unhappy later. Throw away your dummy instance when you are done so you don&#8217;t confuse anybody.</em> </p>
<p>Set up a new instance of an appropriate version, the same or more recent as the original and appropriate architecture (ppc/sparc or intel.) Get it all working with the correct ports so you don&#8217;t conflict with anything else that may be running on the machine. Since it won&#8217;t be indexing, the license doesn&#8217;t matter. Start and then stop so the first run stuff is done.</p>
<p>Change some things so it won&#8217;t touch the index:<br />
./splunk clean all -f<br />
rm /opt/splunk/bin/splunk_optimize<br />
rm /opt/splunk/etc/system/default/inputs.conf (or wherever it is in your version)<br />
edit /opt/splunk/etc/system/default/indexes.conf to comment out the line frozenTimePeriodInSecs = 2419200 in [_thefishbucket] stanza</p>
<p>rm -rf /opt/splunk/var/lib/splunk/fishbucket/*<br />
copy the contents of the fishbucket index you have into the now empty directory (don&#8217;t accidentally create an extra fishbucket/fishbucket directory!)<br />
remove any archives or other temporary files you left lying around in the index directories</p>
<p>Start this instance and now you can search for index=_thefishbucket. It helps to exclude the Splunk internal files with something like this:</p>
<p>index=_thefishbucket NOT filename::/opt/splunk/var/log/splunk/license_audit.log NOT filename::/opt/splunk/var/log/splunk/metrics.log NOT filename::/opt/splunk/var/log/splunk/searchhistory.log NOT filename::/opt/splunk/var/log/splunk/splunkd.log NOT filename::/opt/splunk/var/log/splunk/splunklogger.log NOT filename::/opt/splunk/var/log/splunk/web_access.log NOT filename::/opt/splunk/var/log/splunk/web_service.log</p>
<p>Your full path may vary. What is left is all the files being monitored by the instance. </p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/andrea/2008/08/27/more-fishbucket-fun/feed/</wfw:commentRss>
		</item>
		<item>
		<title>What is this fishbucket thing?</title>
		<link>http://blogs.splunk.com/andrea/2008/08/14/what-is-this-fishbucket-thing/</link>
		<comments>http://blogs.splunk.com/andrea/2008/08/14/what-is-this-fishbucket-thing/#comments</comments>
		<pubDate>Thu, 14 Aug 2008 22:50:44 +0000</pubDate>
		<dc:creator>andrea</dc:creator>
		
		<category><![CDATA[tech]]></category>

		<category><![CDATA[fishbucket]]></category>

		<category><![CDATA[indexing]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/andrea/?p=402</guid>
		<description><![CDATA[It&#8217;s time for a little Indexing 101. If you look in the directory where your Splunk datastore resides (default location /opt/splunk/var/lib/splunk) you will find a directory called fishbucket. This index is not really intended for normal humans to investigate, more just Splunk engineers trying to decipher file input issues. It contains seek pointers and CRCs [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s time for a little Indexing 101. If you look in the directory where your Splunk datastore resides (default location /opt/splunk/var/lib/splunk) you will find a directory called fishbucket. This index is not really intended for normal humans to investigate, more just Splunk engineers trying to decipher file input issues. It contains seek pointers and CRCs for the files you are indexing, so splunkd can tell if it has read them already. To see what&#8217;s there, try searching for &#8220;index=_thefishbucket&#8221;. Events look something like this: </p>
<p>48a304b3 initcrc::5f66db978a1ff3a3 seekcrc::bc96de428cc0b5e6 seekptr::414063 modtime::1218643123 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log</p>
<p>The fields are:</p>
<p>timestamp (epoch time, in hex)<br />
CRC of the first 256 bytes of the file<br />
CRC of the 256 bytes where we were last reading<br />
seek pointer for where we are in the file<br />
the time the file last changed<br />
the full path to the file.<br />
the full path to the source, which is usually the same as the file but could be the archive the file came from.</p>
<p>When the file monitor processor looks at a file, it searches the fishbucket to see if the CRC from the beginning of the file is already there. If not, the file is indexed as new, If yes, then we check the CRC of where we were reading against the saved value in seekcrc. If it matches and the file is longer than the saved seek pointer, then there is new  stuff at the end to read. If the top of the file matches but the seekcrc doesn&#8217;t, or the seek pointer is beyond the current end of the file, then something in the part we have already read has changed. Since we don&#8217;t know what might have changed, we just index the whole thing. (You can control this: see CHECK_METHOD in props.conf.spec.) </p>
<p>If you want to track what is happening with a particular file, you can search for all the events in the fishbucket associated with it by the file or source name (like source::/var/log/apache2/feorlen_org_access_log.) If you check the seekptr and the modtime, they will only be increasing with time (note that events are returned most recent first, so this list is newest to oldest.) </p>
<p>48a3084d initcrc::5f66db978a1ff3a3 seekcrc::3e746e9f66897965 seekptr::414a40 modtime::1218644042 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a307d9 initcrc::5f66db978a1ff3a3 seekcrc::77f6d8313fc689ba seekptr::41419b modtime::1218643929 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a3062e initcrc::5f66db978a1ff3a3 seekcrc::2cc30b86b37c646 seekptr::4140fc modtime::1218643502 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a304b3 initcrc::5f66db978a1ff3a3 seekcrc::bc96de428cc0b5e6 seekptr::414063 modtime::1218643123 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a300d3 initcrc::5f66db978a1ff3a3 seekcrc::8db2f52ef6f75c91 seekptr::413fa4 modtime::1218642130 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2fc7a initcrc::5f66db978a1ff3a3 seekcrc::881375418e194bd5 seekptr::413f06 modtime::1218640999 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2f996 initcrc::5f66db978a1ff3a3 seekcrc::c596371ec4c573d4 seekptr::413e6c modtime::1218640260 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2f80c initcrc::5f66db978a1ff3a3 seekcrc::2e686cf0dd2f62bb seekptr::413dce modtime::1218639883 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2f25a initcrc::5f66db978a1ff3a3 seekcrc::b2e489862ed72c79 seekptr::413d1d modtime::1218638406 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2f1d1 initcrc::5f66db978a1ff3a3 seekcrc::58af0c6446e96bf5 seekptr::413c7f modtime::1218638289 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2f19d initcrc::5f66db978a1ff3a3 seekcrc::16fdb83b48965067 seekptr::413bbe modtime::1218638236 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2f05b initcrc::5f66db978a1ff3a3 seekcrc::fbb8700a35cfdfcb seekptr::413b25 modtime::1218637915 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log<br />
48a2ebc5 initcrc::5f66db978a1ff3a3 seekcrc::ddbac21aa7386a6 seekptr::413abd modtime::1218636714 filename::/var/log/apache2/feorlen_org_access_log source::/var/log/apache2/feorlen_org_access_log</p>
<p>Anything other than this indicates a big problem with the file, like it is getting re-indexed when it shouldn&#8217;t. (Some files you do want to re-index when they change, but not normal logfiles that roll.) </p>
<p>So why do I care? </p>
<p>Every Splunk instance has a fishbucket index, except the lightest of hand-tuned lightweight forwarders, and if you index a lot of files it can get quite large. As any other index, you can change the retention policy to control the size via indexes.conf. But since it tracks what files the instance has seen, you have to consider carefully before you change the retention policy. If you retire data from the fishbucket for files that still exist on the host, it will &#8220;forget&#8221; it saw them and next time around they will get re-indexed. </p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/andrea/2008/08/14/what-is-this-fishbucket-thing/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Splunk and iPhone</title>
		<link>http://blogs.splunk.com/andrea/2008/07/28/splunk-and-iphone/</link>
		<comments>http://blogs.splunk.com/andrea/2008/07/28/splunk-and-iphone/#comments</comments>
		<pubDate>Mon, 28 Jul 2008 18:20:43 +0000</pubDate>
		<dc:creator>andrea</dc:creator>
		
		<category><![CDATA[api]]></category>

		<category><![CDATA[splunk]]></category>

		<category><![CDATA[iphone]]></category>

		<category><![CDATA[livetail]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/andrea/?p=399</guid>
		<description><![CDATA[I&#8217;ve been playing with a few things that will eventually turn into an iPhone application to talk to Splunk via the REST API. I don&#8217;t have a lot to say about it right now due to other issues but I do have a little something to show off: 

Splunk doesn&#8217;t support Safari officially yet and [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been playing with a few things that will eventually turn into an iPhone application to talk to Splunk via the REST API. I don&#8217;t have a lot to say about it right now due to <a href="http://gizmodo.com/5028374/iphone-app-devs-still-gagged-by-non+disclosure-agreement-mad-as-fn-hell-about-it">other issues</a> but I do have a little something to show off: </p>
<p><a href='http://blogs.splunk.com/andrea/wp-content/uploads/2008/07/livetail.jpg'><img src="http://blogs.splunk.com/andrea/wp-content/uploads/2008/07/livetail.jpg" alt="" title="livetail" width="150" height="74" class="alignnone size-thumbnail wp-image-400" /></a></p>
<p>Splunk doesn&#8217;t support Safari officially yet and MobileSafari is a whole &#8216;nother animal, but there are other things you can do. You can talk to the REST endpoints just fine. Here I have a Live Tail search running from the browser, talking to my production server.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/andrea/2008/07/28/splunk-and-iphone/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Forcing dashboard refresh</title>
		<link>http://blogs.splunk.com/andrea/2008/07/25/forcing-dashboard-refresh/</link>
		<comments>http://blogs.splunk.com/andrea/2008/07/25/forcing-dashboard-refresh/#comments</comments>
		<pubDate>Fri, 25 Jul 2008 17:01:16 +0000</pubDate>
		<dc:creator>andrea</dc:creator>
		
		<category><![CDATA[hacks]]></category>

		<category><![CDATA[dashboard]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/andrea/?p=398</guid>
		<description><![CDATA[In 3.2.x and 3.3.x, dashboards refresh automatically on their own schedule: 10% of the time period or 1 hour, whichever is sooner. You can&#8217;t change this right now. But if you want to force a refresh, you can delete the files that contain the cached data. 
Dashboards create username_* files in $SPLUNK_HOME/var/run/splunk to persist the [...]]]></description>
			<content:encoded><![CDATA[<p>In 3.2.x and 3.3.x, dashboards refresh automatically on their own schedule: 10% of the time period or 1 hour, whichever is sooner. You can&#8217;t change this right now. But if you want to force a refresh, you can delete the files that contain the cached data. </p>
<p>Dashboards create username_* files in $SPLUNK_HOME/var/run/splunk to persist the dashboard data. There is also a directory for each username with *.csv files. Delete the username_* files (like &#8220;admin_KB indexed per hour last 24 hours&#8221;) and the *.csv files and the next time you refresh the dashboard, it will reload. </p>
<p>This is not an elegant solution by any means, but it does work. While you could just delete the files for the search in question, there is no simple way to identify which csv file is associated with it. Just don&#8217;t go messing with the other files in this directory, you will be Very Unhappy if you do.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/andrea/2008/07/25/forcing-dashboard-refresh/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Talk to Splunk from WordPress</title>
		<link>http://blogs.splunk.com/andrea/2008/07/15/talk-to-splunk-from-wordpress/</link>
		<comments>http://blogs.splunk.com/andrea/2008/07/15/talk-to-splunk-from-wordpress/#comments</comments>
		<pubDate>Tue, 15 Jul 2008 21:20:32 +0000</pubDate>
		<dc:creator>andrea</dc:creator>
		
		<category><![CDATA[api]]></category>

		<category><![CDATA[hacks]]></category>

		<category><![CDATA[platform]]></category>

		<category><![CDATA[php]]></category>

		<category><![CDATA[rest]]></category>

		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/andrea/?p=395</guid>
		<description><![CDATA[I wrote a WordPress plugin (tested for 2.5.1) that displays my most recent Google search terms in my sidebar. It was an experiment with using the Splunk REST API and the PHP SDK.  
You can configure the widget from the Widgets page and it supports multiple instances with different configuration. Right now the actual [...]]]></description>
			<content:encoded><![CDATA[<p>I wrote a WordPress plugin (tested for 2.5.1) that displays my most recent Google search terms in my sidebar. It was an experiment with using the Splunk REST API and the PHP SDK.  </p>
<p>You can configure the widget from the Widgets page and it supports multiple instances with different configuration. Right now the actual search string is hardcoded because I&#8217;m doing some extra mangling to get the search terms the way I want anyway, but I&#8217;ll be adding that to the configuration options also. Eventually there will be a way to cache results so you don&#8217;t do the search each time the page is loaded. </p>
<p>Since there is still work to do to make it more generic, I haven&#8217;t uploaded it to the WordPress site. But here is the basic PHP code to play around with. In fine programming tradition, I learned quite a lot by picking apart existing WordPress widgets, in this case Random Image and Twitter Tools. This widget requires the Splunk PHP SDK, by default my code is expecting it to be in the same directory (which is probably going to be something like wp/wp-content/plugins/widgetname.) There are a few things it depends on, you can find the details at the <a href="http://code.google.com/p/splunk-php-sdk/">Google Code page.</a> </p>
<p>You can find the widget here:<br />
<a href='http://blogs.splunk.com/andrea/wp-content/uploads/2008/07/splunk_statsphp1.gz'>splunk_statsphp1</a></p>
<p>Note: updated version posted 31 July 08.</p>
<p>Here&#8217;s a sample of the kinds of events I&#8217;m looking at. I have some extra field extractions because it&#8217;s a custom format and not exactly access_combined, but I get the referer in there. What I want to display is the actual search string, in this case &#8220;drum+carder&#8221;. I have to strip out the &#8216;+&#8217; between words because otherwise it doesn&#8217;t wrap nicely in my narrow sidebar. (I&#8217;m sure I could fix this in my theme somehow but Eric Meyer I&#8217;m not.)  </p>
<p>xxx.xxx.xxx.xxx [15/Jul/2008:12:08:07 -0700] &#8220;GET /tag/drum-carder/ HTTP/1.1&#8243; 200 &#8220;http://www.google.com/search?hl=en&#038;pwst=1&#038;q=drum+carder&#038;start=10&#038;sa=N&#8221;</p>
<p>You can go look at the code if you really want to know, but here are a few comments on what it&#8217;s doing: </p>
<p>I only want a couple results, so to make the search as fast as possible I&#8217;m limiting what I get back.<br />
        // how many results to get?<br />
        $dispatchProps['max_count'] = 3;</p>
<p>Also there&#8217;s no need to have the default time to live, so set the timeout to something reasonable. This could be much smaller, even.<br />
        // don&#8217;t leave the search hanging around<br />
	$dispatchProps['timeout'] = 300;</p>
<p>It&#8217;s a pretty simple search, the auto key/value extraction already gets the q= stuff out of the referer field.<br />
        // using head to get only what I want makes the search way faster<br />
        $job_id = $searchMgr->syncSearch(&#8217;search sourcetype=&#8221;spinnyspinny_access_log&#8221; google search | head 3&#8242;, $dispatchProps);</p>
<p>Here&#8217;s what it looks like in my sidebar: </p>
<p><a href='http://blogs.splunk.com/andrea/wp-content/uploads/2008/07/sidebar_widget.png'><img src="http://blogs.splunk.com/andrea/wp-content/uploads/2008/07/sidebar_widget.png" alt="image of my sidebar widget installed" title="sidebar_widget" width="207" height="140" class="alignnone size-medium wp-image-397" /></a></p>
<p>If you want to see it in action, I have it installed in my personal blog at <a href="http://www.feorlen.org">http://www.feorlen.org</a>. It is pulling statistics about my other site at <a href="http://www.spinnyspinny.com">http://www.spinnyspinny.com</a>, which gets a lot of search engine hits from Google. If you want to test it, search for &#8220;spinnyspinny&#8221; and some other relevant keywords like &#8220;yarn&#8221; and you will find my site. Don&#8217;t go abusing it now, because you know that Splunk will be telling me your IP! <img src='http://blogs.splunk.com/andrea/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/andrea/2008/07/15/talk-to-splunk-from-wordpress/feed/</wfw:commentRss>
		</item>
		<item>
		<title>What is it doing?</title>
		<link>http://blogs.splunk.com/andrea/2008/07/15/what-is-it-doing/</link>
		<comments>http://blogs.splunk.com/andrea/2008/07/15/what-is-it-doing/#comments</comments>
		<pubDate>Tue, 15 Jul 2008 16:31:15 +0000</pubDate>
		<dc:creator>andrea</dc:creator>
		
		<category><![CDATA[CLI]]></category>

		<category><![CDATA[dev]]></category>

		<category><![CDATA[tech]]></category>

		<category><![CDATA[audit]]></category>

		<category><![CDATA[dispatch]]></category>

		<category><![CDATA[metrics]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/andrea/?p=394</guid>
		<description><![CDATA[Up here in SupportLand, I get a lot of questions about how to understand the various bits of information that Splunk itself is tracking. The past couple of versions have added several new things to make it easier to see what is going on. Here are some of the things you can look at. 
audit.log
New [...]]]></description>
			<content:encoded><![CDATA[<p>Up here in SupportLand, I get a lot of questions about how to understand the various bits of information that Splunk itself is tracking. The past couple of versions have added several new things to make it easier to see what is going on. Here are some of the things you can look at. </p>
<p>audit.log</p>
<p>New in 3.2, the audit.log records who did what based on what capability was requested from the authorization system. It shows both user-initiated actions like login and automated actions like running saved searches. </p>
<p>Login<br />
07-14-2008 10:59:09.434 INFO  AuditLogger - Audit:[timestamp=Mon Jul 14 10:59:09 2008, user=admin, action=login attempt, info=succeeded][n/a]</p>
<p>Running a script<br />
07-14-2008 10:59:12.542 INFO  AuditLogger - Audit:[timestamp=Mon Jul 14 10:59:12 2008, user=admin, action=run_script_sendemail, info=granted ][n/a]</p>
<p>Dispatch search<br />
07-14-2008 14:43:39.619 INFO  AuditLogger - Audit:[timestamp=Mon Jul 14 14:43:39 2008, user=admin, action=search, info=granted dispatch maxtime=0 maxresults=100 [search sudo | eval sizeof=length(host)  ] | outputcsv][n/a]</p>
<p>REST request<br />
07-15-2008 08:21:33.576 INFO  AuditLogger - Audit:[timestamp=Tue Jul 15 08:21:33 2008, user=admin, action=search, info=granted REST: /search/jobs][n/a]</p>
<p>license_audit.log</p>
<p>These are the LicenseManager event that used to be reported in splunkd.log, now they are in their own file. The things to pay attention to are quotaExceededCount (number of license violations,) peak (all-time high daily volume) and todaysBytesIndexed. rolloverCount is the number of rollovers since last cleanUsually there is one event generated a day, just after midnight, but there can be others if the instance has been restarted. </p>
<p>07-15-2008 00:01:38.456 INFO  LicenseManager-Audit - Audit:[timestamp=1216105298 quotaExceededCount=0, lastExceedDate=0, peak=14699861, rolloverCount=1, totalCumulativeBytesAtRollover=14699861, todaysBytesIndexed=14699861][Jls7bqb2G3dcwAgzAmi0P5pmJn1+IgDwMpoxmW1idMGbA1IlW2amr8tYq5ROlL3bysBxpCV46OEBCt3MJxjI73VvmGSWffU5C+1K3UXYejOLBdinoRavtk+hgLil69eF4n/vQ2mVixK179iHVkzckUcUe8X8iz8qPZT6BEvFhh0AukKlk6IFCrXWRftYysMEIR0IAmcuns7PWBzo/FmEOdm9rBKfVnNMKSvvos39QVooj4O6Km2+xsMUododll8w9IMrl9l0dDHW4AhfZfEN7Sf8krE1c/T/Q+VAxMRgzB0iqJWIddtIxgp6pmdBzD2q7dk9L2pAbkjzDlXRM5GyAg==]</p>
<p>metrics.log</p>
<p>I&#8217;ve talked about this one before, when trying to identify high volume data inputs. New for 3.3, in addition to the default 10 items per period you can configure how many items are reported in metrics.log by setting maxseries in limits.conf. (See limits.conf.spec for details.) Making this number larger will impact performance, but you can do it for investigating a specific issue. Or you can reduce it also. As before, it&#8217;s a sample of the top n items for each group in a 30 second period. So if you have 200 sources, you won&#8217;t see all your data inputs here. We are already talking about what metrics we can report, so in 4.0 expect to see new options. </p>
<p>Track blocked queues by looking for &#8220;blocked!!&#8221;:<br />
06-24-2008 09:22:08.792 INFO  Metrics - group=queue, name=parsingqueue, blocked!!=true, max_size=1000, filled_count=21, empty_count=0, current_size=1000, largest_size=1000, smallest_size=908</p>
<p>See which processors are actively running:<br />
07-09-2008 14:03:43.876 INFO  Metrics - group=pipeline, name=parsing, processor=utf8, cpu_seconds=0.321082, executes=90770, cumulative_hits=218992256</p>
<p>Diagnostic searches with CLI dispatch</p>
<p>The new dispatch search allows searching across many more events than the older search command. From the CLI, you can use the dispatch command or write something that uses the REST API. Particular searches can tell you more than just returning events. Dispatch from the CLI is particularly suited to this as it&#8217;s designed for reporting across huge sets of events (although not to return those hundreds of thousands of events.) It may take a while to run, but it will complete. </p>
<p>How many events?<br />
./splunk dispatch &#8220;sourcetype=access_combined | stats count&#8221;</p>
<p>How big are they?<br />
./splunk dispatch &#8220;host=foohost1  | eval sizeof=length(_raw) | stats sum(sizeof)&#8221;</p>
<p>How big are various other things?<br />
./splunk dispatch &#8220;sourcetype=syslog | eval sizeof=length(host) | stats avg(sizeof)&#8221;</p>
<p>Note that all of these use additional search commands to report on the set of events rather than the events themselves. Actual results returned from dispatch via the CLI are maintained in memory, so trying to get back thousands of events or more can cause serious problems. Don&#8217;t do it. </p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/andrea/2008/07/15/what-is-it-doing/feed/</wfw:commentRss>
		</item>
		<item>
		<title>More frequent alerts with CLI dispatch</title>
		<link>http://blogs.splunk.com/andrea/2008/07/14/more-frequent-alerts-with-cli-dispatch/</link>
		<comments>http://blogs.splunk.com/andrea/2008/07/14/more-frequent-alerts-with-cli-dispatch/#comments</comments>
		<pubDate>Mon, 14 Jul 2008 18:17:14 +0000</pubDate>
		<dc:creator>andrea</dc:creator>
		
		<category><![CDATA[CLI]]></category>

		<category><![CDATA[alerting]]></category>

		<category><![CDATA[dispatch]]></category>

		<category><![CDATA[saved searches]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/andrea/?p=393</guid>
		<description><![CDATA[The saved search scheduler that the UI uses runs into trouble when you start running a bunch of searches at the same time. It kicks off one, waits for it to return or timeout and then moves on to the next. If the searches take more than a few seconds to run or there are [...]]]></description>
			<content:encoded><![CDATA[<p>The saved search scheduler that the UI uses runs into trouble when you start running a bunch of searches at the same time. It kicks off one, waits for it to return or timeout and then moves on to the next. If the searches take more than a few seconds to run or there are dozens of them all with high frequency, it gets overloaded. One way to address this is to take advantage of the new dispatch (asynchronous search.) Dispatch is what is behind the REST API search functions and you can also get to it from the CLI with the &#8220;dispatch&#8221; command instead of the old &#8220;search.&#8221;</p>
<p>Old CLI search: </p>
<p>./splunk search &#8220;sourcetype=access_combined googlebot | stats count&#8221; -maxresults 500<br />
count<br />
&#8212;&#8211;<br />
213  </p>
<p>New CLI search: </p>
<p>./splunk dispatch &#8220;sourcetype=access_combined googlebot | stats count&#8221;<br />
count<br />
&#8212;&#8211;<br />
213  </p>
<p>While the results look the same for this simple search, there is a lot different going on behind the scenes. The search command needs to load all the events it touches into memory, so there is only so much of the index it can search at one time. The data generation part, before the pipe, will only return maxresults number of events, which may not be all of them. If you then filter with additional search commands you won&#8217;t get all of what you think you should. You can increase maxresults (default for the CLI is 100) but you can only push it so much until you run into memory problems. </p>
<p>The dispatch search kicks off a job that runs until completion, no matter how long it takes. But one thing to keep in mind is that CLI dispatch is designed for reporting: the actual results are all in memory so you can&#8217;t get back thousands of results from a single search. Use reporting commands like stats or narrow your searches so they won&#8217;t have more than a couple hundred results. (If you need more, write something that uses the REST API where you have access to job control.)</p>
<p>So how this applies to alerting: </p>
<p>In the UI, when a scheduled search runs, it uses a search command to actually generate the alert. There are a couple different ones, but as most people want an email I&#8217;ll focus on sendemail. (Docs here: http://www.splunk.com/doc/3.3/user/UnsupportedCommands#sendemail.) </p>
<p>Any search can use the sendemail search command, it&#8217;s not limited to the UI. So I can do this: </p>
<p>./splunk dispatch &#8220;error | sendemail to=sysadmins@example.com from=splunk@example.com&#8221; </p>
<p>This runs the search and then looks for a mail server (by default on the local machine) to send the message. Since it&#8217;s using dispatch, you can kick off a bunch of these and they will all run independently of each other. You can look at the jobs from the REST endpoint: </p>
<p>https://localhost:8089/services/search/jobs</p>
<p>Splunk Atom Feed: jobs<br />
Updated: 2008-07-14T10:39:16-0700 Splunk build: 38343<br />
dispatch<br />
cursorTime	1969-12-31T16:00:00.000-08:00<br />
error<br />
eventCount	316<br />
isDone	1<br />
isFinalized	0<br />
isPaused	0<br />
isStreaming	0<br />
keywords	sudo<br />
resultCount	100<br />
sid	1216057125.31<br />
ttl	3570.9 seconds<br />
events - results - timeline - summary -<br />
control:</p>
<p>2008-07-14T10:38:47.000-07:00 | admin</p>
<p>Here&#8217;s an example I set up on my local machine, an OS X 10.5 box which uses postfix. I&#8217;ve already made sure postfix is running and I can receive mail to my local account. </p>
<p>I wrote a script that does 50 searches, all set to alert with an email address. Note the auth in the command, if you aren&#8217;t already authenticated you will need to use the auth command as part of the CLI search. In a  production environment, you would want a more sophisticated means of handling login credentials than sticking plaintext into a script. (You could also use a restricted user created only for CLI searches.) </p>
<p>[root]:/opt/splunk3.3/bin$ more alert_overload.sh<br />
./splunk dispatch &#8220;sudo | sendemail to=feorlen from=foo01&#8243; -auth admin:changeme&#038;<br />
./splunk dispatch &#8220;sudo | sendemail to=feorlen from=foo02&#8243; -auth admin:changeme&#038;<br />
./splunk dispatch &#8220;sudo | sendemail to=feorlen from=foo03&#8243; -auth admin:changeme&#038;<br />
./splunk dispatch &#8220;sudo | sendemail to=feorlen from=foo04&#8243; -auth admin:changeme&#038;<br />
./splunk dispatch &#8220;sudo | sendemail to=feorlen from=foo05&#8243; -auth admin:changeme&#038;<br />
./splunk dispatch &#8220;sudo | sendemail to=feorlen from=foo06&#8243; -auth admin:changeme&#038;<br />
./splunk dispatch &#8220;sudo | sendemail to=feorlen from=foo07&#8243; -auth admin:changeme&#038;<br />
[...]</p>
<p>When I run this script, it starts up all these searches. (Note that each one starts up another python! Keep that in mind.) When they complete, they send an email alert. </p>
<p> N 16 foo13@AndreasSplunkP  Mon Jul 14 10:59 393/280230 &#8220;Splunk Results&#8221;<br />
 N 17 foo17@AndreasSplunkP  Mon Jul 14 10:59 393/280230 &#8220;Splunk Results&#8221;<br />
 N 18 foo11@AndreasSplunkP  Mon Jul 14 10:59 393/280230 &#8220;Splunk Results&#8221;<br />
 N 19 foo32@AndreasSplunkP  Mon Jul 14 10:59 393/280230 &#8220;Splunk Results&#8221;<br />
 N 20 foo09@AndreasSplunkP  Mon Jul 14 10:59 393/280230 &#8220;Splunk Results&#8221;<br />
? s* dispatch_test.mbox<br />
&#8220;dispatch_test.mbox&#8221; [New file]<br />
? x<br />
AndreasSplunkPowerbook-2[feorlen]:~$ grep ^From: dispatch_test.mbox | wc -l<br />
      50</p>
<p>The messages don&#8217;t arrive in the same order, but they do arrive. For these 50 test searches, it was about 20 seconds for all of them. More complicated searches will take longer. One thing to know is that if you are searching faster than it can complete, as in every minute you start a search that takes two minutes to run, they will back up and take a while to complete. There is no hard guideline, as it depends on the individual searches and the overall load on the instance. </p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/andrea/2008/07/14/more-frequent-alerts-with-cli-dispatch/feed/</wfw:commentRss>
		</item>
		<item>
		<title>overriding default syslog host extraction</title>
		<link>http://blogs.splunk.com/andrea/2008/04/16/overriding-default-syslog-host-extraction/</link>
		<comments>http://blogs.splunk.com/andrea/2008/04/16/overriding-default-syslog-host-extraction/#comments</comments>
		<pubDate>Wed, 16 Apr 2008 17:04:50 +0000</pubDate>
		<dc:creator>andrea</dc:creator>
		
		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/andrea/2008/04/16/overriding-default-syslog-host-extraction/</guid>
		<description><![CDATA[I had a customer recently ask how to change the host that was applied to a particular set of incoming events. Normally this wouldn&#8217;t be a big deal, just specify the new name in inputs.conf. But this is from syslog. When you set one of the syslog sourcetypes there is some extra processing to extract [...]]]></description>
			<content:encoded><![CDATA[<p>I had a customer recently ask how to change the host that was applied to a particular set of incoming events. Normally this wouldn&#8217;t be a big deal, just specify the new name in inputs.conf. But this is from syslog. When you set one of the syslog sourcetypes there is some extra processing to extract the correct hostname which overrides other settings.  And the hostname in the event is wrong.</p>
<p>So to get the right one, I set up this transform to force it to a specified value. And still give it my correct syslog sourcetype.<br />
My inputs.conf is tailing an entire directory, which for sake of demonstration I&#8217;m going to pretend is all syslog.</p>
<pre>
$ more inputs.conf
host = support09.splunk.com
[tail:///var/log]
disabled = false
host = support09.splunk.com
sourcetype = syslog</pre>
<p>props.conf is specifying a transform only for the source of interest:</p>
<pre>
$ more props.conf
[source::/var/log/system.log]
# note: overriding default syslog transform!
TRANSFORMS = feorlenhost</pre>
<p>and transforms.conf is defining what to do to it. I have to specify a REGEX, but I&#8217;m not actually using it so I&#8217;ll just say &#8216;.&#8217; to match everything. The FORMAT line is what is going to set my host:</p>
<pre>
$ more transforms.conf
[feorlenhost]
DEST_KEY = MetaData:Host
REGEX = .
FORMAT = host::feorlenhost.splunk.com</pre>
<p>So whatever syslog put in there for host, ignore and use my static value instead.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/andrea/2008/04/16/overriding-default-syslog-host-extraction/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Digging into metrics.log</title>
		<link>http://blogs.splunk.com/andrea/2008/03/13/digging-into-metricslog/</link>
		<comments>http://blogs.splunk.com/andrea/2008/03/13/digging-into-metricslog/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 18:21:16 +0000</pubDate>
		<dc:creator>andrea</dc:creator>
		
		<category><![CDATA[tech]]></category>

		<category><![CDATA[metrics]]></category>

		<category><![CDATA[thruput]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/andrea/2008/03/13/digging-into-metricslog/</guid>
		<description><![CDATA[Occasionally people ask for help in identifying a rogue data input that is suddenly spewing events. If it&#8217;s hidden in a ton of similar data it can be difficult to sort out which one is actually the problem. One place to look is the Splunk internal metrics.log. You can find it by searching the internal [...]]]></description>
			<content:encoded><![CDATA[<p>Occasionally people ask for help in identifying a rogue data input that is suddenly spewing events. If it&#8217;s hidden in a ton of similar data it can be difficult to sort out which one is actually the problem. One place to look is the Splunk internal metrics.log. You can find it by searching the internal index (add &#8220;index=_internal&#8221; to your search) or just look in the file itself (located in $SPLUNK_HOME/var/log/splunk.)</p>
<p>Before I get into what can be found there, I need to explain what metrics.log is not. It is a sampling over 30 second intervals, so it will not give you an exact accounting of all your inputs. For each type of item reported, you get the top ten hot sources over the interval, based on the size of the event (_raw.) It is different from the numbers reported by LicenseManager, which include the indexed fields. Also, the default configuration only maintains the metrics data in the internal index a few days, but by going to the files you can see trends over a period of months if your rolled files go that far back.</p>
<p>A typical metrics.log has stuff like this:</p>
<p>03-13-2008 10:48:55.620 INFO  Metrics - group=pipeline, name=tail, processor=tail, cpu_seconds=0.000000, executes=31, cumulative_hits=73399<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=pipeline, name=typing, processor=annotator, cpu_seconds=0.000000, executes=63, cumulative_hits=134912<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=pipeline, name=typing, processor=clusterer, cpu_seconds=0.000000, executes=63, cumulative_hits=134912<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=pipeline, name=typing, processor=readerin, cpu_seconds=0.000000, executes=63, cumulative_hits=134912<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=pipeline, name=typing, processor=sendout, cpu_seconds=0.000000, executes=63, cumulative_hits=134912<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=thruput, name=index_thruput, instantaneous_kbps=0.302766, instantaneous_eps=2.129032, average_kbps=0.000000, total_k_processed=19757, load_average=0.124023<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=per_host_thruput, series=&#8221;fthost&#8221;, kbps=0.019563, eps=0.096774, kb=0.606445<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=per_host_thruput, series=&#8221;grumpy&#8221;, kbps=0.283203, eps=2.032258, kb=8.779297<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=per_index_thruput, series=&#8221;_internal&#8221;, kbps=0.275328, eps=1.903226, kb=8.535156<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=per_index_thruput, series=&#8221;_thefishbucket&#8221;, kbps=0.019563, eps=0.096774, kb=0.606445<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=per_index_thruput, series=&#8221;default&#8221;, kbps=0.007876, eps=0.129032, kb=0.244141<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=per_source_thruput, series=&#8221;/applications/splunk3.2/var/log/splunk/metrics.log&#8221;, kbps=0.272114, eps=1.870968, kb=8.435547<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=per_source_thruput, series=&#8221;/applications/splunk3.2/var/log/splunk/splunkd.log&#8221;, kbps=0.003213, eps=0.032258, kb=0.099609<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=per_source_thruput, series=&#8221;/var/log/apache2/somedomain_access_log&#8221;, kbps=0.007876, eps=0.096774, kb=0.244141<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=per_source_thruput, series=&#8221;filetracker&#8221;, kbps=0.019563, eps=0.096774, kb=0.606445<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=per_sourcetype_thruput, series=&#8221;access_common&#8221;, kbps=0.007876, eps=0.129032, kb=0.244141<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=per_sourcetype_thruput, series=&#8221;filetrackercrclog&#8221;, kbps=0.019563, eps=0.096774, kb=0.606445<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=per_sourcetype_thruput, series=&#8221;splunkd&#8221;, kbps=0.275328, eps=1.903226, kb=8.535156<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=queue, name=aeq, max_size=10, filled_count=0, empty_count=0, current_size=0, largest_size=0, smallest_size=0<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=queue, name=aq, max_size=10, filled_count=0, empty_count=0, current_size=0, largest_size=0, smallest_size=0<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=queue, name=tailingq, current_size=0, largest_size=0, smallest_size=0<br />
03-13-2008 10:48:55.620 INFO  Metrics - group=queue, name=udp_queue, max_size=1000, filled_count=0, empty_count=0, current_size=0, largest_size=0, smallest_size=0</p>
<p>There&#8217;s a lot more there than just volume data, but for now I&#8217;ll focus on investigating data inputs. &#8220;group=&#8221; identifies what type of thing being reported on and series the particular item. For incoming events, the amount of data processed is in the thruput group, as in per_host_thruput. In my case, I&#8217;m only indexing data from one host so per_host_thruput actually can tell me something useful: that right now host &#8220;grumpy&#8221; indexes around 8k in a 30-second period.  Since there is only one host I could add it all up and get a good picture of what I&#8217;m indexing, but if I had more than 10 hosts I would only get a sample.</p>
<p>03-13-2008 10:49:57.634 INFO  Metrics - group=per_host_thruput, series=&#8221;grumpy&#8221;, kbps=0.245401, eps=1.774194, kb=7.607422<br />
03-13-2008 10:50:28.642 INFO  Metrics - group=per_host_thruput, series=&#8221;grumpy&#8221;, kbps=0.237053, eps=1.612903, kb=7.348633<br />
03-13-2008 10:50:59.648 INFO  Metrics - group=per_host_thruput, series=&#8221;grumpy&#8221;, kbps=0.217584, eps=1.548387, kb=6.745117<br />
03-13-2008 10:51:30.656 INFO  Metrics - group=per_host_thruput, series=&#8221;grumpy&#8221;, kbps=0.245621, eps=1.741935, kb=7.614258<br />
03-13-2008 10:52:01.661 INFO  Metrics - group=per_host_thruput, series=&#8221;grumpy&#8221;, kbps=0.311051, eps=2.290323, kb=9.642578<br />
03-13-2008 10:52:32.669 INFO  Metrics - group=per_host_thruput, series=&#8221;grumpy&#8221;, kbps=0.296938, eps=2.322581, kb=9.205078<br />
03-13-2008 10:53:03.677 INFO  Metrics - group=per_host_thruput, series=&#8221;grumpy&#8221;, kbps=0.261593, eps=1.838710, kb=8.109375<br />
03-13-2008 10:53:34.686 INFO  Metrics - group=per_host_thruput, series=&#8221;grumpy&#8221;, kbps=0.263136, eps=2.032258, kb=8.157227<br />
03-13-2008 10:54:05.692 INFO  Metrics - group=per_host_thruput, series=&#8221;grumpy&#8221;, kbps=0.261530, eps=1.806452, kb=8.107422<br />
03-13-2008 10:54:36.699 INFO  Metrics - group=per_host_thruput, series=&#8221;grumpy&#8221;, kbps=0.313855, eps=2.354839, kb=9.729492</p>
<p>For example, I know that access_common is a popular sourcetype for events on this webserver, so it would give me a good idea of what was happening:</p>
<p>03-13-2008 10:51:30.656 INFO  Metrics - group=per_sourcetype_thruput, series=&#8221;access_common&#8221;, kbps=0.022587, eps=0.193548, kb=0.700195<br />
03-13-2008 10:52:01.661 INFO  Metrics - group=per_sourcetype_thruput, series=&#8221;access_common&#8221;, kbps=0.053585, eps=0.451613, kb=1.661133<br />
03-13-2008 10:52:32.670 INFO  Metrics - group=per_sourcetype_thruput, series=&#8221;access_common&#8221;, kbps=0.031786, eps=0.419355, kb=0.985352<br />
03-13-2008 10:53:34.686 INFO  Metrics - group=per_sourcetype_thruput, series=&#8221;access_common&#8221;, kbps=0.030998, eps=0.387097, kb=0.960938<br />
03-13-2008 10:54:36.700 INFO  Metrics - group=per_sourcetype_thruput, series=&#8221;access_common&#8221;, kbps=0.070092, eps=0.612903, kb=2.172852<br />
03-13-2008 10:56:09.722 INFO  Metrics - group=per_sourcetype_thruput, series=&#8221;access_common&#8221;, kbps=0.023564, eps=0.290323, kb=0.730469<br />
03-13-2008 10:56:40.730 INFO  Metrics - group=per_sourcetype_thruput, series=&#8221;access_common&#8221;, kbps=0.006048, eps=0.096774, kb=0.187500<br />
03-13-2008 10:57:11.736 INFO  Metrics - group=per_sourcetype_thruput, series=&#8221;access_common&#8221;, kbps=0.017578, eps=0.161290, kb=0.544922<br />
03-13-2008 10:58:13.748 INFO  Metrics - group=per_sourcetype_thruput, series=&#8221;access_common&#8221;, kbps=0.025611, eps=0.225806, kb=0.793945</p>
<p>But I&#8217;ve got way more than 10 sourcetypes, so at any particular time some other one could spike and access_common wouldn&#8217;t be reported. per_index_thruput and per_source_thruput work similarly.</p>
<p>With this in mind, lets dissect the standard saved search &#8220;KB indexed per hour last 24 hours&#8221;.</p>
<p>index::_internal metrics group=per_index_thruput NOT debug NOT sourcetype::splunk_web_access | timechart fixedrange=t span=1h sum(kb) | rename sum(kb) as totalKB</p>
<p>This means look in the internal index for metrics data of group per_index_thruput, ignore some internal stuff and make a report showing the sum of the kb values. For cleverness, we&#8217;ll also rename the output to something meaningful, &#8220;totalKB&#8221;. The result looks like this:</p>
<p>sum of kb vs. time for results in the past day<br />
_time	totalKB<br />
1	03/12/2008 11:00:00	922.466802<br />
2	03/12/2008 12:00:00	1144.674811<br />
3	03/12/2008 13:00:00	1074.541995<br />
4	03/12/2008 14:00:00	2695.178730<br />
5	03/12/2008 15:00:00	1032.747082<br />
6	03/12/2008 16:00:00	898.662123</p>
<p>Those totalKB values just come from the sum of kb over a one hour interval. If I like, I can change the search and get just the ones from grumpy:</p>
<p>index::_internal metrics grumpy group=per_host_thruput  | timechart fixedrange=t span=1h sum(kb) | rename sum(kb) as totalKB</p>
<p>sum of kb vs. time for results in the past day<br />
_time	totalKB<br />
1	03/12/2008 11:00:00	746.471681<br />
2	03/12/2008 12:00:00	988.568358<br />
3	03/12/2008 13:00:00	936.092772<br />
4	03/12/2008 14:00:00	2529.226566<br />
5	03/12/2008 15:00:00	914.945313<br />
6	03/12/2008 16:00:00	825.353518</p>
<p>index::_internal metrics access_common group=per_sourcetype_thruput  | timechart fixedrange=t span=1h sum(kb) | rename sum(kb) as totalKB</p>
<p>sum of kb vs. time for results in the past day<br />
_time	totalKB<br />
1	03/12/2008 11:00:00	65.696285<br />
2	03/12/2008 12:00:00	112.035162<br />
3	03/12/2008 13:00:00	59.775395<br />
4	03/12/2008 14:00:00	35.008788<br />
5	03/12/2008 15:00:00	62.478514<br />
6	03/12/2008 16:00:00	14.173828</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/andrea/2008/03/13/digging-into-metricslog/feed/</wfw:commentRss>
		</item>
		<item>
		<title>conf files, part 2</title>
		<link>http://blogs.splunk.com/andrea/2007/12/12/conf-files-part-2/</link>
		<comments>http://blogs.splunk.com/andrea/2007/12/12/conf-files-part-2/#comments</comments>
		<pubDate>Thu, 13 Dec 2007 00:30:43 +0000</pubDate>
		<dc:creator>andrea</dc:creator>
		
		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/andrea/2007/12/12/conf-files-part-2/</guid>
		<description><![CDATA[Here are a couple more of my conf files explained. First the simple one:
server.conf


[sslConfig]
enableSplunkSearchSSL = true

All this says is that I&#8217;m using SSL on the front end. I clicky clicky the nice UI control and it magically happens. There could be a pile of other stuff in here, like specifying real paid-money-for certs if I [...]]]></description>
			<content:encoded><![CDATA[<p>Here are a couple more of my conf files explained. First the simple one:</p>
<p><strong>server.conf</strong></p>
<blockquote>
<pre>
[sslConfig]
enableSplunkSearchSSL = true</pre>
</blockquote>
<p>All this says is that I&#8217;m using SSL on the front end. I clicky clicky the nice UI control and it magically happens. There could be a pile of other stuff in here, like specifying real paid-money-for certs if I were using any. But I&#8217;m not. Self-signed works for me, even if it means my users get whiny messages from their browsers. Whatever.</p>
<p><strong>access_controls.conf</strong></p>
<blockquote>
<pre>
[roles]
apache2 = source::/var/log/apache2

[groups]
hosted_user = apache2

[users]
user1 = hosted_user</pre>
</blockquote>
<p>I added some access controls to help out one of my novice users, somebody who maintains the content on several sites but isn&#8217;t a big sysadmin. I set up a role that only allows access to the apache logs and assign it to the group hosted_user, which is then specified for user1. I thought about giving her access to just the files she needs, but that would mean specifying them each individually, either in multiple roles or one role with a bunch of OR terms in a single role.</p>
<p>Here&#8217;s where the trouble starts. The way granular access controls work is that it&#8217;s fundamentally just another search, one built for the user with the administrator&#8217;s desired restrictions. It essentially adds another OR for each role, in addition to any that may be in the role itself. For 3.1.x, OR is a bad thing. More than a couple of them and searches grind to a halt. I could do some funny business with moving around the locations of the files and put hers in a subdirectory. But it&#8217;s not worth the bother, the whole apache log directory is fine.</p>
<p>This is one part that is changing in preview, with the addition of flexible roles. Also, other improvements are making searches with OR basically not an issue.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/andrea/2007/12/12/conf-files-part-2/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
