<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>erik</title>
	<atom:link href="http://blogs.splunk.com/erik/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.splunk.com/erik</link>
	<description>Just another WordPress weblog</description>
	<pubDate>Sat, 20 Sep 2008 01:41:02 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
	<language>en</language>
			<item>
		<title>Search engine for virtual sprawl - vmware app for splunk</title>
		<link>http://blogs.splunk.com/erik/2008/08/10/search-engine-for-virutal-sprawl-vmware-app-for-splunk/</link>
		<comments>http://blogs.splunk.com/erik/2008/08/10/search-engine-for-virutal-sprawl-vmware-app-for-splunk/#comments</comments>
		<pubDate>Sun, 10 Aug 2008 22:57:24 +0000</pubDate>
		<dc:creator>erik</dc:creator>
		
		<category><![CDATA[Homepage]]></category>

		<category><![CDATA[api]]></category>

		<category><![CDATA[dev]]></category>

		<category><![CDATA[hacks]]></category>

		<category><![CDATA[life]]></category>

		<category><![CDATA[platform]]></category>

		<category><![CDATA[release]]></category>

		<category><![CDATA[splunk base]]></category>

		<category><![CDATA[tech]]></category>

		<category><![CDATA[management]]></category>

		<category><![CDATA[sprawl]]></category>

		<category><![CDATA[virtualization]]></category>

		<category><![CDATA[vmware]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/erik/?p=396</guid>
		<description><![CDATA[****  UPDATE - 09/16/08  ****
Thanks to more testing i have found and fixed a few critical bugs.
Updated APP version 1.6 >> here  $JAVAHOME/bin/java

If it worked it should spit back a bunch of options to pass to the java command. If its not set right you will get some kind of file not [...]]]></description>
			<content:encoded><![CDATA[<p><b>****  UPDATE - 09/16/08  ****</b></p>
<p>Thanks to more testing i have found and fixed a few critical bugs.<br />
Updated APP version 1.6 <a href="http://blogs.splunk.com/erik/wp-content/uploads/2008/09/vmware1.zip">>> here <<</a></p>
<ul>
<li>
there was a static var preventing the multiple server configs from working. Should be fixed, and multiple servers in the vmware.conf should work.
</li>
<li>
Ibm jvm&#8217;s should work - ie AIX should now work <img src='http://blogs.splunk.com/erik/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />
</li>
<li>
Added new saved searches and a few dashboards ( thanks to raffy <img src='http://blogs.splunk.com/erik/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />
</li>
</ul>
<p>As usual, please let me know if you find any bugs.<br />
I&#8217;ll type up some notes on my VMworld experince</p>
<p>Cheers,<br />
e</p>
<p><b>****  UPDATE - 09/08/08  ****</b><br />
Thanks to lots of folks trying it out i have found a critical bug that was preventing much of the data from getting indexed. <a href="http://blogs.splunk.com/erik/wp-content/uploads/2008/09/vmware.zip">This latest release 1.5</a> should have that fix and everyone should see all the wonderful VMWare data in the index.</p>
<p>As usual, bug me if it does not work or you have any questions.</p>
<p>If you have made changes to vmware/local/vmware.conf  and not to the file in default you can just untar this version on top of your old one. If you are making changes to the default/vmware.conf file, i&#8217;d move that to local/vmware.conf that way when i ship updates it will not blow away your conf changes. We ship only default and not local/vmware.conf.</p>
<p>Thanks again to everyone that helped find bugs!</p>
<p>e.</p>
<p><b>****  UPDATE - 08/27/08  ****</b><br />
I have <a href="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/vmware1.zip">updated the app</a> with a few fixes found in the field. </p>
<ul>
<li>hopefully fixed issue on AIX (IBM jvm )
</li>
<li>added output of host/vm name on update messages. It was hard to tell where the messages were coming from
</li>
<li>added more debugging infor on startup to help debug connection issues.
</li>
</ul>
<p>Things that are still under-investigation.</p>
<ul>
<li>Pointing at lots of ESX servers and not VC. Seems as though some data is not coming back from ESX.
</li>
<li>Making work with older jvm&#8217;s ( currently it seems i require 1.5)
</li>
</ul>
<p><b>****  Original Post 08/10/08  ****</b><br />
I&#8217;ve wanted to release this a few months ago but the project keeps getting stuck on the back-burner.  Finally I&#8217;ve cleaned it up and had a few people try it and it seems to work well. I&#8217;m sure there are configurations and versions out there that will have issues - please write me back ( my first name at splunk.com ) if it does not work as advertised. </p>
<p>Reading the below makes it sounds more difficult that it really is. Just download, un-zip, change the server url, username and password in the vmware.conf file, restart and go! This really is the first pubic release and i&#8217;d love to get more feedback. I&#8217;ll more than gladly send you Splunk tee shirt of your choice if you help find bugs or have useful suggestions!</p>
<p><strong>Why you want to give it a try:</strong><br />
This vmware app is a cool way to keep track of what your VC and ESX servers are up to, what instances are running where, when they are under load, when instances move, when they have errors, and much more. Since all the data is indexed in Splunk, it&#8217;s easy and quick to search for problems and report on your virtual sprawl. </p>
<p><strong>How it works:</strong><br />
This app will connect a splunk server to any number of Virtual Center and/or ESX servers and grab/index the events, logs, properties, performance data, and anything else I can get my grubby mitts on. It&#8217;s easy to hookup and get going, so if you use Virtual Center or ESX than give this app a try. I&#8217;ll explain how to install/setup, how to trouble shoot, and what you will see when you get it working. You will need to install splunk or use an existing Splunk server.  See the configuration file for settings on how often to pull data. Also near the end of this post i give example searches to explain the data.</p>
<p>After installing you get cool graphs like this one showing CPU Usage by Guest by Time:</p>
<p><img src="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/picture-1.png" alt="cool graph" /></p>
<p><strong>Add Inside-out monitoring</strong><br />
Its optional but if you can also put splunk on the guest OS&#8217;s as light weight forwarders and you will get a brilliant inside out view where we capture not only what VC/ESX thinks but what the guests are seeing on the inside. My best practice is to put splunk on the guests and capture basic logs as well as OS performance metrics, what apps are running, how much mem/cpu they are taking, etc. You can get the <a href="http://www.splunkbase.com/apps/All/Technologies/Systems_Management/Monitoring/app:Splunk+for+UNIX">Unix/Linux version here</a> and the <a href="http://www.splunkbase.com/apps/All/Technologies/Operating_Systems/Windows/app:Splunk+for+Windows+Management">windows here</a>.  Of course its not required and you get a ton of value out of just with the basic vmware app&#8217;s monitoring of VC/ESX.</p>
<p><strong>INSTALLATION:</strong></p>
<p>**Important**<br />
This app requires a JVM be installed on the same box as the splunk server. I know this is less that optimal. Please bug your local VMWare rep and tell them to make me REST API&#8217;s and not SOAP API&#8217;s. The VMware API&#8217;s are hideously over complicated - Please dear VMware make a simple REST interface.</p>
<p><strong>1)</strong> Make sure java is present and set the JAVAHOME environment variable. If not already set you must be set JAVAHOME to the directory that contains the java binary.</p>
<p><strong>2)</strong> To test the variable is set correctly, try and run the following on the command line<br />
<code>  windows> "%JAVAHOME%\bin\java<br />
    linux/unix> $JAVAHOME/bin/java<br />
</code><br />
If it worked it should spit back a bunch of options to pass to the java command. If its not set right you will get some kind of file not found error.</p>
<p><strong>3)</strong> Grab the vmware.zip file <strong><a href="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/vmware.zip">HERE</a></strong>.</p>
<p><strong>4)</strong> Unzip the file - and copy the resultant &#8220;vmware&#8221; directory to your SPLUNK_HOME/etc/apps/ directory. When done the following directory should exist: SPLUNK_HOME/etc/apps/vmware.</p>
<p><strong>CONFIGURATION:</strong><br />
There are a few config settings to make the app work.</p>
<p><strong>5)</strong> First you need to let Splunk know where your VC or ESC servers are. Edit the <code>vmware/default/vmware.conf </code> configuration file to point to your vc or esx servers. If using VC you need not specify all ESX servers under management, splunk will get the list from VC. The config file contains one or more of the following stanza&#8217;s ( the unique_name can be anything you like so long as its unique):<code><br />
	[vmserver:unique_name]<br />
</code><br />
For each [vmserver] stanza be sure to set:<code><br />
	url=https://your_server_IP/sdk<br />
	username= your_user<br />
	password=your_passowrd<br />
</code></p>
<p>Note that the url should be the ipaddr of your server with &#8220;/sdk&#8221; at the end - for example &#8220;url=https://10.1.1.35/sdk&#8221;. A good way to test that the url and username/password are correct is test using a web browser. Take the url you have entered above and replace the &#8220;sdk&#8221; with &#8220;mob&#8221;. Use the web browser to navigate to that url and make sure it asks for username and password and that the values you entered above will authenticate correctly. If the &#8220;mob&#8221; url works with the username and passowrd you entered than splunk should have no trouble.  </p>
<p>With those three set you should be up and running after a restart.<br />
The rest of the config file should be self explanatory and is included end of this post for reference but you should not need to change anything else.</p>
<p><strong>Testing and Troubleshooting:</strong></p>
<p><strong>6)</strong> It&#8217;s best to test running the vmware app outside of splunk first.<br />
You&#8217;ll need to make sure that SPLUNK_HOME is set for the test.</p>
<p>**  On Windows  **:<br />
<code>   set SPLUNK_HOME=your splunk directory </code><br />
#note it does not like it when i add quotes around this path - try with no quotes.</p>
<p>Then run the app by hand<br />
<code>    > cd %SPLUNK_HOME%\etc\apps\vmware<br />
    >  java -jar lib/splunk.jar </code></p>
<p>**  On others  ** :<br />
<code>    export SPLUNK_HOME=your splunk directory </code></p>
<p>Then run the app by hand:<br />
<code>   > cd $SPLUNK_HOME/etc/apps/vmware<br />
    >  java -jar lib/splunk.jar  </code></p>
<p>It should spit out all sorts of vmware data. If it throws an error its likely that SPLUNK_HOME or JAVAHOME are NOT set. Remember SPLUNK_HOME will be set by the server when the server runs the script. You need only set it for testing.</p>
<p>If it does not work, likely the exception will have something useful in it such as connection refused ( bad auth ) or a 404 error in which case the url is incorrect.</p>
<p>If you get any non-obvious errors email me ( my first name at splunk.com ).</p>
<p><strong>7)</strong> Try running in splunk.<br />
If the above test works than you should be able to just restart splunk and all should be good. The way to tell if its working is that you will get events with sourcetype vmware and vmware_api.</p>
<p><strong>8 )</strong> If you do NOT see events of type vmware_api on the dashboard than try the following search:<br />
&#8220;index=_internal error&#8221;<br />
and<br />
&#8220;index=_internal  splunk4vmi.py&#8221;</p>
<p>You should see some kind of error or warning that is hopefully obvious. If not again email me and i&#8217;ll sort you out.</p>
<p><strong>Using the App</strong></p>
<p>At this point it should be working and you should be able to search for cool stuff.<br />
Here is a quick overview of what splunk is indexing:</p>
<p>After restarting you should see a bunch of logs from vwmare and at least two new sourcetypes ; vmware and vmware_api.  Below is a screen shot of my dashboard after restarting - notice the vmware logs and the vmware_api event counts.</p>
<p><img src="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/picture-6.png" alt="sources" /></p>
<p>The vmware sourcetype is for the actual vwmware logs while the vmware_api sourcetype is for the API calls. It can take a minute before they show up so if they are not there, try again after a minute. If you still do not have the logs that likely means the logs path in the vmware.conf if incorrect and you should make sure the path is correct or contact me. </p>
<p>If you do not see the API calls than there is likely an auth or url error that should have been caught when you did the manual test above. Try retesting by hand above - if the by-hand method works but not through splunk than contact me.</p>
<p>I&#8217;ve just started to explore the logs that come back - there is a ton of information in them but my test infrastructure is not all that insteresting so i&#8217;m not sure what goodness you all might find in them. Poke around the files and see what you see and bug me if you see anything interesting i can make them into alerts / reports.</p>
<p>The meat of the data is from the API where we pull everything we can.<br />
Most useful are: </p>
<p><strong>1) Metrics</strong><br />
Every few seconds we captures the metrics for all VM&#8217;s, including<br />
<img src="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/properties.png" alt="metrics" /></p>
<p><strong>2) Events</strong><br />
I&#8217;m not sure the scope of these but it looks like interesting events kicked out by ESX. Someone with a larger VMware installation might find far more interesting events than i see on our infrastructure.<br />
<img src="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/events.png" alt="events" /></p>
<p><strong>3) Updates:</strong><br />
It looks like when anything changes, we can an update.<br />
<img src="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/updates.png" alt="updates" /></p>
<p><strong>4) Inventory: </strong><br />
I periodically just capture the inventory tree. It&#8217;s more for debugging than perhaps useful in a production environment but it does not cost much to get and it can be useful.<br />
<img src="http://blogs.splunk.com/erik/wp-content/uploads/2008/08/inventory.png" alt="inventory" /></p>
<p>Thanks to Christina we do ship with a bunch of saved searches. After installing you should see them, they all start with &#8216;VM:&#8217;. They are named to be somewhat obvious, again let me know if they dont work or you have some better ones to add to the default app. Try some of the Metrics and Status saved searches to make sure your install is working.</p>
<ul>
<li>VM: Investigation CPU load on all guests sharing ESX server</li>
<li>VM: Investigation Find ESX Host for Guest</li>
<li>VM: Investigation Find Guests sharing ESX Server - Non FQDN</li>
<li>VM: Investigation- Find other VMs sharing ESX Host</li>
<li>VM: Investigation- Processes on hosts sharing ESX Server</li>
<li>VM: Investigation- Running processes on other guests on same ESX server</li>
<li>VM: Metrics- CPU by Guest last 60 minutes * VM: Metrics- Host Memory Usage last 15 minutes</li>
<li>VM: Metrics- Host Memory Usage last 60 minutes</li>
<li>VM: Metrics- Memory by Guest last 60 minutes</li>
<li>VM: Status- Free Space by Datastore</li>
<li>VM: Status- Running Guests</li>
<li>VM: Status- Running VMs </li>
</ul>
<p>That&#8217;s about it.<br />
Like i said, PLEASE email me if you have bugs or suggestions.<br />
I&#8217;ll plan on updating the app with whatever feedback i get from folks. So please, help me out and get yourself a tee shirt.</p>
<p>Kind Regards,<br />
e.</p>
<p>P.S. - there is a sample of the config just so that you can see what&#8217;s in it without downloading:<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
The following are the important values in the config file:</p>
<p><code><br />
[vmserver:demo]<br />
url=https://10.2.1.151/sdk      ## This is the url to the vc or esx server<br />
username=your_username     ## user name to auth against the server. If you are not sure of its value point we browser at the above url and check the web auth, it will be the same.<br />
password=your_passowrd            ## we will support non-clear text in the near future.<br />
ignorecert = t              ## for now leave as true (t), we will soon support checking of certs<br />
loggingLevel = error            ## to turn on debugging values are [error, warn, info, debug ]</p>
<p>index_events = t            ## should we index events (t)rue or (f)alse<br />
events_interval = 10            ## how often to check for events in seconds</p>
<p>index_properties = t            ## should we index events (t)rue or (f)alse<br />
property_interval = 10      ## how often to check for events in seconds</p>
<p>index_metrics = t           ## should we index events (t)rue or (f)alse<br />
metrics_interval = 10           ## how often to check for events in seconds</p>
<p>index_updates = t           ## should we index events (t)rue or (f)alse<br />
updates_interval = 10       ## how often to check for updates in seconds</p>
<p>index_logs = t              ## should we index logs (t)rue or (f)alse<br />
logs_interval = 300         ## how often to get log changes&#8230;<br />
logs_localpath = ../var/spool/vmware    ## the logs are copied from vc/esx to the this directory where splunk will pick them up for indexing<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/erik/2008/08/10/search-engine-for-virutal-sprawl-vmware-app-for-splunk/feed/</wfw:commentRss>
		</item>
		<item>
		<title>My favorite &#8220;customer&#8221; and Splunk as multi-tenant platform</title>
		<link>http://blogs.splunk.com/erik/2008/07/22/my-favorite-customer-and-splunk-as-multi-tenant-platform/</link>
		<comments>http://blogs.splunk.com/erik/2008/07/22/my-favorite-customer-and-splunk-as-multi-tenant-platform/#comments</comments>
		<pubDate>Wed, 23 Jul 2008 04:27:25 +0000</pubDate>
		<dc:creator>erik</dc:creator>
		
		<category><![CDATA[Homepage]]></category>

		<category><![CDATA[dev]]></category>

		<category><![CDATA[platform]]></category>

		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/erik/?p=394</guid>
		<description><![CDATA[Everyone has their favorite customer.
I have one too and he is the CTO of a very cool IVR/VoIP platform. His name is RJ Auburn
 
Around here is synonomys with filing 34 bugs between sunday 9PM when we push bits to the site and 9AM when we get in to the office. I dont mean the [...]]]></description>
			<content:encoded><![CDATA[<p>Everyone has their favorite customer.<br />
I have one too and he is the CTO of a very cool IVR/VoIP platform. His name is <a href="http://www.google.com/search?q=rj+auburn+voxeo&#038;ie=utf-8&#038;oe=utf-8&#038;aq=t&#038;rls=org.mozilla:en-US:official&#038;client=firefox-a">RJ Auburn</a><br />
<img class="alignleft size-medium wp-image-67" src="http://ecommmedia.com/mt-static/support/assets_c/userpics/userpic-87-100x100.png" align="left" border="16" margin="10" alt="rj"/> </p>
<p>Around here is synonomys with filing 34 bugs between sunday 9PM when we push bits to the site and 9AM when we get in to the office. I dont mean the usual the UI-is-off-by-10-pixels but complex indexing or distributed search bugs. Well, sometimes is its a trivial thing we missed, but usually he is usually pushing splunk to its limits. Its not often that a CTO and &#8220;industry expert&#8221; is the one to personally put splunk through its paces - but it&#8217;s RJ is like that and gets his hands dirty - and splunk is the better for it. </p>
<p>RJ and Voxeo are one of a few, but quickly growing, number of companies that are using splunk in a multi-tenant environment. This means using splunk to to collect data across multiple tenants in a hosted environment and then using splunk for searching and reporting on a per customer basis. Often the output of the searches/reports is rendered for the customer do they can see what is going on within the service. Customer dashboards and activity reports are a common usecase for splunk.  Below are some of the images from the voxeo service:<br />
<br />
<img src="http://blogs.voxeo.com/voxeotalks/files/2008/06/prophecylogsearch3-1.jpg" alt="vox dash" /></p>
<p>On the <a href="http://blogs.voxeo.com/voxeotalks/2008/06/18/voxeo-announces-a-new-beta-service-prophecy-log-search-a-better-way-to-search-your-application-log-files/">Voxeo blog</a> there is a nice description and even a cool video introduction: </p>
<p>Lessons learned from these initial deployments are having a significant effect on our upcoming 4.0 release. First and foremost we will provide a much better html &#8220;module&#8221; system so that you can embed splunk modules in other webpages. Secondly, we will be having the overall splunk UI more configurable and modular so that multi-tenant customers can build even more custom UI&#8217;s. </p>
<p>One other very interesting trend is using splunk for SaS using cloud services. Often these uses have some kind of multi-tenant &#8230;. It wont be long before splunk makes deploying in the cloud even easier. More in a post to come but do drop me aline if you want to use splunk in the cloud and i can give you some hints.</p>
<p>In the mean time if your looking for the best push-it-to-the-limits beta tester, contact RJ!<br />
Thanks RJ <img src='http://blogs.splunk.com/erik/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>e.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/erik/2008/07/22/my-favorite-customer-and-splunk-as-multi-tenant-platform/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Congrats to FlowingData - strength in (subscriber) numbers!</title>
		<link>http://blogs.splunk.com/erik/2008/07/20/congrats-to-flowingdata-strength-in-subscriber-numbers/</link>
		<comments>http://blogs.splunk.com/erik/2008/07/20/congrats-to-flowingdata-strength-in-subscriber-numbers/#comments</comments>
		<pubDate>Sun, 20 Jul 2008 18:42:04 +0000</pubDate>
		<dc:creator>erik</dc:creator>
		
		<category><![CDATA[Homepage]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/erik/?p=392</guid>
		<description><![CDATA[We here at splunk are into processing lots of data. Our external marketing focuses mostly on hardcore IT data but internally we play with all sorts of data sets : government stats, sports stats, even music as shown by Brian cool post. 
I just wanted to congratulate Nathan over at FlowingData for crossing the 3100 [...]]]></description>
			<content:encoded><![CDATA[<p>We here at splunk are into processing lots of data. Our external marketing focuses mostly on hardcore IT data but internally we play with all sorts of data sets : government stats, sports stats, even music as shown by <a href="http://blogs.splunk.com/brian/2008/07/14/splunking-pitchfork-album-reviews/">Brian cool pos</a>t. </p>
<p>I just wanted to congratulate Nathan over at <a href="http://www.FlowingData.com">FlowingData</a> for crossing the <a href="http://flowingdata.com/2008/07/19/thank-you-everyone-for-reading-flowingdata/">3100 subscriber mark</a>. </p>
<div style="background-color:#FFFFFF;">
<img src="http://flowingdata.com/wp-content/themes/flowingdata-1-0/images/logo.gif" alt="flowingdata logo" />
</div>
<p>FlowingData is a fantastic example of the hidden value in the data all around us. As more and more of what we do is documented by computers the impact of statistics has become less of a hard-core math geek sport and more within the reach of anyone&#8217;s curiosity. His daily posts are a constant reminder of how statistics has become a crossover genre.  </p>
<p>Thank you Nathan!<br />
e</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/erik/2008/07/20/congrats-to-flowingdata-strength-in-subscriber-numbers/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Splunk for Virtualization</title>
		<link>http://blogs.splunk.com/erik/2008/03/27/splunk-for-virtualization/</link>
		<comments>http://blogs.splunk.com/erik/2008/03/27/splunk-for-virtualization/#comments</comments>
		<pubDate>Thu, 27 Mar 2008 21:14:54 +0000</pubDate>
		<dc:creator>erik</dc:creator>
		
		<category><![CDATA[Homepage]]></category>

		<category><![CDATA[api]]></category>

		<category><![CDATA[dev]]></category>

		<category><![CDATA[platform]]></category>

		<category><![CDATA[splunk base]]></category>

		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/erik/2008/03/27/splunk-for-virtualization/</guid>
		<description><![CDATA[I&#8217;m looking for some help.
I&#8217;ve built a VMWare app for splunk and in the process of doing the same for Xen. These Apps use the VMWare and Xensource API&#8217;s to index everything about the VM environment. When combined with splunk instances running within the guest OS you get a very comprehensive historical picture. I&#8217;m curious [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m looking for some help.<br />
I&#8217;ve built a VMWare app for splunk and in the process of doing the same for Xen. These Apps use the VMWare and Xensource API&#8217;s to index everything about the VM environment. When combined with splunk instances running within the guest OS you get a very comprehensive historical picture. I&#8217;m curious are there any splunk customers out there using VMWare or Xen? I&#8217;m looking for usecases so that i better understand how to configure the apps. I&#8217;d be curious to know what types of information would be useful to capture and what types of searches would one want to perform. Both Xen and VMWare have so much data available that configuration could be complicated. I&#8217;m trying to narrow it down to several useful out of the box configurations. If your have any thoughts comment here or email me at erik at splunk dot com.</p>
<p>Thanks<br />
e.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/erik/2008/03/27/splunk-for-virtualization/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Performance impact of fast drives (via sorkin)</title>
		<link>http://blogs.splunk.com/erik/2008/01/29/performance-impact-of-fast-drives-via-sorkin/</link>
		<comments>http://blogs.splunk.com/erik/2008/01/29/performance-impact-of-fast-drives-via-sorkin/#comments</comments>
		<pubDate>Wed, 30 Jan 2008 04:29:54 +0000</pubDate>
		<dc:creator>erik</dc:creator>
		
		<category><![CDATA[dev]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/erik/2008/01/30/performance-impact-of-fast-drives-via-sorkin/</guid>
		<description><![CDATA[The following is copped from a support email by Stephen Sorkin who is the man behind the splunk server curtain &#8230; thought it should go broader.

I&#8217;m the manager of the search and indexing team at Splunk. We&#8217;re still in the process of writing up our findings from storage benchmarks but here are the general details.
High [...]]]></description>
			<content:encoded><![CDATA[<p>The following is copped from a support email by Stephen Sorkin who is the man behind the splunk server curtain &#8230; thought it should go broader.</p>
<blockquote><p>
I&#8217;m the manager of the search and indexing team at Splunk. We&#8217;re still in the process of writing up our findings from storage benchmarks but here are the general details.</p>
<p>High IO/s typically means both faster indexing in general and faster searching of rare, temporally incoherent events. On average, we&#8217;ve seen indexing speeds increase by about 66% going from an 7200 RPM SATA RAID to a 15K RPM SCSI RAID. We&#8217;ve seen comparable performance from SCSI and SAS RAIDs, provided they&#8217;re 15K RPM.</p>
<p>The best best benchmarking tool we&#8217;ve found for measuring how Splunk will behave on your disk hardware is bonnie++. If your disk subsystem can sustain 800 IO/s, you&#8217;re in good shape.</p>
<p>As far as searching goes, IO/s is the dominant factor for non-coherent, infrequently accessed search results. This means, if you&#8217;re just searching for the newest data, or even have to reach back through 1MM events to return 10k, the disk is NOT the bottleneck, since each individual read() will pull many events off disk. However, if you&#8217;re searching for a rare term, like a name, that occurs once an hour or once a day, each read() is going to require the drive arm move. If you&#8217;re using a 7200 RPM SATA drive, that&#8217;s about 100 IO/s and hence on the order of 100 retrieved events per second. If you have a decent RAID, that could be 800 retrieved events per second.
</p></blockquote>
<p>-s</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/erik/2008/01/29/performance-impact-of-fast-drives-via-sorkin/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Its about time - Preview #3</title>
		<link>http://blogs.splunk.com/erik/2008/01/29/its-about-time-preview-3/</link>
		<comments>http://blogs.splunk.com/erik/2008/01/29/its-about-time-preview-3/#comments</comments>
		<pubDate>Wed, 30 Jan 2008 02:39:05 +0000</pubDate>
		<dc:creator>erik</dc:creator>
		
		<category><![CDATA[dev]]></category>

		<category><![CDATA[preview]]></category>

		<category><![CDATA[release]]></category>

		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/erik/2008/01/29/its-about-time-preview-3/</guid>
		<description><![CDATA[
Hey all,
It&#8217;s taken longer than we would have liked but our 3rd preview build has been posted.
Get&#8217;um here
A bunch of work has gone into windows stability, tons of bugs were fixed, and a bunch of customer requests have been implemented ( we will let you know out of band ). We expect that this release [...]]]></description>
			<content:encoded><![CDATA[<p><a href='http://blogs.splunk.com/devuploads/2008/01/picture-12.png' title='hex'><img src='http://blogs.splunk.com/devuploads/2008/01/picture-12.png' alt='hex' / align="right" border="0" width="200" height="200"></a><br />
Hey all,</p>
<p>It&#8217;s taken longer than we would have liked but our 3rd preview build has been posted.<br />
<a href="http://www.splunk.com/index.php/preview/20080129">Get&#8217;um here</a></p>
<p>A bunch of work has gone into windows stability, tons of bugs were fixed, and a bunch of customer requests have been implemented ( we will let you know out of band ). We expect that this release should be more stable, slightly faster, and less buggy.</p>
<p>Left to do, we still have a bunch of IE work, performance improvements, and cleaning up of some features like interactive field extraction and event type discovery.</p>
<p>Its still not production ready so don&#8217;t even think of trying it out for real - and there is no guarantee that migration will work from a preview to GA ( we will migrate from 3.1.x to GA but not preview ).  Also, don&#8217;t run splunk as root - its just not good to do until we run through all our testing.</p>
<p>As always, please send us feedback at splunkpreview@splunk.com or hit us up on IRC (irc.efnet.org #splunk).<br />
The last round of info from Preview #2 was awesome please keep it up!</p>
<p>e.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/erik/2008/01/29/its-about-time-preview-3/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Just in time for new year - its Preview #2</title>
		<link>http://blogs.splunk.com/erik/2007/12/29/just-in-time-for-new-year-its-preview-2/</link>
		<comments>http://blogs.splunk.com/erik/2007/12/29/just-in-time-for-new-year-its-preview-2/#comments</comments>
		<pubDate>Sun, 30 Dec 2007 06:38:39 +0000</pubDate>
		<dc:creator>erik</dc:creator>
		
		<category><![CDATA[dev]]></category>

		<category><![CDATA[preview]]></category>

		<category><![CDATA[release]]></category>

		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/erik/2007/12/29/just-in-time-for-new-year-its-preview-2/</guid>
		<description><![CDATA[Happy new year (bit early) all dev.splunk.com readers&#8230;.
We have just posted our second 3.2 preview release. (build number 30455)
Its packed with holiday goodness, albeit very raw.
First you will notice we have posted a windows build. Its been in the cooker since last Feb and thanks to Mitch, Ledio, Igor and a bit of Amrit we [...]]]></description>
			<content:encoded><![CDATA[<p>Happy new year (bit early) all dev.splunk.com readers&#8230;.<br />
We have just posted our second <a href="http://www.splunk.com/index.php/preview/20071229">3.2 preview release</a>. (build number 30455)</p>
<p>Its packed with holiday goodness, albeit very raw.</p>
<p>First you will notice we have posted a windows build. Its been in the cooker since last Feb and thanks to Mitch, Ledio, Igor and a bit of Amrit we now have a single code base that rocks on linux, mac, solaris, freebsd, aix, AND windows.  This was not an easy feat as evidenced by our gift of a <a href="http://valleywag.com/tech/silicon-valley-users-guide/understanding-geeks-++-the-100+word-version-331539.php">pony (soft and electronic)</a> to Mitch for his effort. Its still very raw (the build not the pony), and has a tendency to crash because of a memory fragentation and limited vm space. Which will be fixed by GA&#8230; MarkB. will post more on the build so stay tuned for details. Its a big deal for us so be patient and we sure could use feedback on how to make it the best it can be.</p>
<p>Also in this release you will see the UI starts to get some of the async search results. Over the next few releases we will be moving to fully async search in the UI. It will take a few turns but this preview has some of the first cut.</p>
<p>There are a bunch of other improvements; scheduled searches got a bit of a cleanup in the UI and the backend has been improved as well. Performance, bugs, and other tweaks are also spread throughout. I&#8217;ll get others to post specifics.</p>
<p>In the mean time, as always its a huge help to us in dev if you can kick the tires before we freeze for GA. Please send feedback to splunkpreview@splunk.com, post comments to this blog, or drop by and tell us in person.</p>
<p>Again, thanks for the help and happy new year from all of us in dev@splunk!</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/erik/2007/12/29/just-in-time-for-new-year-its-preview-2/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Preivew #1 is up</title>
		<link>http://blogs.splunk.com/erik/2007/12/05/preivew-1-is-up/</link>
		<comments>http://blogs.splunk.com/erik/2007/12/05/preivew-1-is-up/#comments</comments>
		<pubDate>Wed, 05 Dec 2007 20:39:56 +0000</pubDate>
		<dc:creator>erik</dc:creator>
		
		<category><![CDATA[dev]]></category>

		<category><![CDATA[hacks]]></category>

		<category><![CDATA[preview]]></category>

		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/erik/2007/12/05/preivew-1-is-up/</guid>
		<description><![CDATA[Splunk fans.
We have posted the our first of many preview releases. You can find them here:
Our hope is that every week or two as new features or API&#8217;s become usable that we post builds soliciting feedback.
This first post has a bunch of backend and UI performance improvements as well as some new but hidden features:

live [...]]]></description>
			<content:encoded><![CDATA[<p>Splunk fans.</p>
<p>We have posted the our first of many preview releases. You can find them <a href="http://www.splunk.com/index.php/preview/20071204">here</a>:<br />
Our hope is that every week or two as new features or API&#8217;s become usable that we post builds soliciting feedback.<br />
This first post has a bunch of backend and UI performance improvements as well as some new but hidden features:</p>
<ul>
<li>live searching of data</li>
<li>flexible roles</li>
<li>scripted authentication</li>
<li>event decoration ( for the xmas season )</li>
<li>auditing of splunk server actions</li>
<li>file system change detection</li>
<li>improved (proper) sub second support</li>
<li>transaction search</li>
<li>new experimental simple search interface</li>
<li>&#8220;where&#8221; support in search clause ( you dont need to use the &#8220;| where&#8221; anymore and can just search for foo=10 )</li>
</ul>
<p>I&#8217;m not going to explain here what these things mean or how to find them or use them <img src='http://blogs.splunk.com/erik/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /><br />
Instead the product managers and developers will post here with ideas on what to try and what feedback we are looking for.</p>
<p>I&#8217;d like to thank in advance those brave few of you that have the few minutes to install these builds and give us your feedback.</p>
<p>e.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/erik/2007/12/05/preivew-1-is-up/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Splunk 3.2 Preview #1 is coming</title>
		<link>http://blogs.splunk.com/erik/2007/11/29/splunk-32-preview-1-is-coming/</link>
		<comments>http://blogs.splunk.com/erik/2007/11/29/splunk-32-preview-1-is-coming/#comments</comments>
		<pubDate>Fri, 30 Nov 2007 00:41:31 +0000</pubDate>
		<dc:creator>erik</dc:creator>
		
		<category><![CDATA[preview]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/erik/2007/11/29/splunk-32-preview-1-is-coming/</guid>
		<description><![CDATA[Hi all,
Just a heads up that we are moving to a model where we post previews of upcoming releases.
Starting now, we are going into a mode where long before a GA release we will be posting development builds. At first, they may be a few weeks apart but over time our goal is to post [...]]]></description>
			<content:encoded><![CDATA[<p>Hi all,</p>
<p>Just a heads up that we are moving to a model where we post previews of upcoming releases.</p>
<p>Starting now, we are going into a mode where long before a GA release we will be posting development builds. At first, they may be a few weeks apart but over time our goal is to post builds as soon as new functionality or API&#8217;s are ready for comment.</p>
<p>This first Preview #1 will have backend performance and scale improvements as well as some cool new features. The developers and PM&#8217;s will be posting to this blog the specifics of what is new, how to try it, and where we are going.</p>
<p>Our hope is that we get early feedback on new features and API&#8217;s before we actually ship.</p>
<p>Thanks in advance for helping try out our early wares.</p>
<p>Kinds Regards,</p>
<p>e.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/erik/2007/11/29/splunk-32-preview-1-is-coming/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Making reports faster by caching scheduled searches</title>
		<link>http://blogs.splunk.com/erik/2007/11/18/making-reports-faster-through-saved-searches/</link>
		<comments>http://blogs.splunk.com/erik/2007/11/18/making-reports-faster-through-saved-searches/#comments</comments>
		<pubDate>Mon, 19 Nov 2007 01:34:21 +0000</pubDate>
		<dc:creator>erik</dc:creator>
		
		<category><![CDATA[Homepage]]></category>

		<category><![CDATA[dev]]></category>

		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://blogs.splunk.com/erik/2007/11/18/making-reports-faster-through-saved-searches/</guid>
		<description><![CDATA[I find this hard to explain even though its an extremely simple concept. It would be nice to get some feedback since I think we want to productize the idea but we are not clear on what makes sense.
If I have a search/report that I want to run faster, I will save that search and [...]]]></description>
			<content:encoded><![CDATA[<p>I find this hard to explain even though its an extremely simple concept. It would be nice to get some feedback since I think we want to productize the idea but we are not clear on what makes sense.</p>
<p>If I have a search/report that I want to run faster, I will save that search and have splunk run it over a small timeframe (5,15,30,60 min) taking the results of that search/report and feeding them back into an index i create to hold cached results.</p>
<p>For example, suppose I like to run nightly reports where I show &#8220;top users by bandwidth&#8221;. Its easy enough to run the report every night, but suppose there are times during the day when I want incrementals, or I want to look at last week, or perhaps get dailies over a month. Every time I run the search/report I need to search and recalculate &#8220;top users by bandwidth&#8221;, which if over billions of events can take time <img src='http://blogs.splunk.com/erik/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>Instead, I&#8217;ll just save the search/report and have Splunk run it every 15 minutes with the results being sent to a &#8220;cache&#8221; index. This way if I ever want to do an adhoc search on &#8220;top users&#8221; or if I want to do &#8220;weekly reports by day&#8221; all the data is precalculated.  </p>
<p>Think of this as creating &#8220;logs&#8221; that are the output of a search/report and then having Splunk index those &#8220;logs&#8221;.  To get fast results you can then search/report on the summarized cached data.</p>
<p>If not obvious why it&#8217;s faster, suppose you are indexing 500M events a day and 100M of those have bandwidth data. To report on &#8220;top bandwidth by users&#8221; I need to run a search to get the 100M events then run the report across all 100M.<br />
If instead I were in the background running that same search/report over each hour interval, then saving the data back into splunk, I would reduce the data i&#8217;m operating on from 100M down to 1200 ( 24*500 ) (assuming that i&#8217;m getting top 500). Doing searches/reports on the later dataset are sub second versus the few minutes it would take to run across the 100M. </p>
<p>Make sense ? - its really simple but odd to explain.</p>
<p>PART ONE - Setup:</p>
<ul>
<li>1. Grab the reportcache search script from <a href='http://blogs.splunk.com/devuploads/2007/11/reportcache.py' title='reportcache.py'>&#8220;** here **</a> and put it in your <code>SPLUNK_HOME/etc/searchscripts</code> directory - no need to restart you can now cache any search/report data.</li>
<li>2. Add a cache index - either add the following to your <code>etc/bundles/local/indexes.conf</code> or create a new bundle and add to that <code>indexes.conf</code> You will need to restart splunk after adding the index.<br />
<code><br />
[cache]<br />
homePath   = $SPLUNK_DB/cache/db<br />
coldPath   = $SPLUNK_DB/cache/colddb<br />
thawedPath = $SPLUNK_DB/cache/thaweddb<br />
</code></li>
</ul>
<p>PART TWO - Testing by writing to a file:</p>
<p>I recommend that you first test reportcache by having it output to a file that you scan to make sure things look right. </p>
<ul>
<li>1. Find a search you want to cache. Simple candidate is something like the following report against the internal index that shows queue sizes by queue name.<br />
<code>index=_internal metrics "group=queue"  timechart avg(current_size) by name</code></li>
<li>2. Once you have a search you want to cache - add the following <code>"reportcache index=cache path=/tmp file=testcache.log notimestamp"</code> command to the end. The following assumes you have made an index named &#8220;cache&#8221;. The <code>index</code> attribute is required and you should not use your default unless you know what your doing. Also we are going to output the file to /tmp/testcache.log</code> using the <code>file</code> and <code>path</code> attributes. The <code>notimestamp</code> option simply suppresses adding a timestamp to the filename.<br />
<code>index=_internal metrics "group=queue" | timechart avg(current_size) by name | reportcache index=cache path=/tmp file=testcache.log notimestamp</code></li>
<li>3. Run the search and you should get back the normal search results and not see an error on the screen. If you do see an error it should be self explanatory.</li>
<li>4. Open the file /tmp/testcache.log and make sure the results look ok. They should look like a bunch of lines key=value, key=value</li>
</ul>
<p>PART THREE - Writing to an index:</p>
<ul>
<li>1.  We are now going to have the command put the results into the index. Simply remove the file, path and notimestamp attributes<br />
<code>index=_internal metrics "group=queue" | timechart avg(current_size) by name | reportcache index=cache</code></li>
<li>2. Run the command - you should again see normal results and no error.</li>
<li>4. Wait 30 seconds or so&#8230;
<li>5. Run the following search to make sure results made it into the cache index - you should see your cache data after this search<br />
<code>index=cache</code></li>
<li>6. Now click on the report link and see if you can get your report back <img src='http://blogs.splunk.com/erik/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> This part is the somewhat odd part. All the fields should be as they were in the original search but many reports create keys with odd names. The best thing to do is to click around and see what reports you can make. You should be able to get back to the original search/report prior to the caching.</li>
</ul>
<p>PART FOUR - Enabling automatic caching:</p>
<p>After you have found and tested a search/report you want to cache moving forward:</p>
<ul>
<li>1. Save the search along with the reportcache command</li>
<li>2. Schedule the saved search on a small time frame ( 5, 15, 30, etc ) minutes</li>
<li>3. Test by waiting a few hours and looking at the results in the cache index.</li>
</ul>
<p>There is a good chance that either the above description was vague or that there is a bug / edge-case that i did not consider.<br />
One frequent problem i have seen is trying to cache data that has no timestamp. For example,<br />
<code>somesearch | top users</code><br />
will produce restults without timestamps. This makes a mess of the cached data. If you have this problem then try rewriting your search to something like:<br />
<code>somesearch | stats count first(_time) by users | where users != "" | sort -count </code><br />
The above will produce data that has both top and timestamps.</p>
<p>Few other things that are common requests:<br />
Often folks want to go back in time and create cached results for prior data. I have a script that can do that and will post it after more testing.<br />
Another common topic of conversation surrounds the over creation of summary data. In many cases it can be benificial to cache more stuff than you initally need in case you want to run reports later. I&#8217;m trying to think of good ways to automatically do this for you.</p>
<p>** IMPORTANT ** - drop me a line and let me know how something like this *should* work. I suspect that we will add a &#8220;checkbox&#8221; to saved searches that will automatically do the right thing. </p>
<p>I&#8217;ll leave this post wit the usage info from the top of the search script.</p>
<p># usage: <some report search> | reportcache <reportcache options><br />
#   <reportcache options><br />
#       file=[filename] - default is current time<br />
#       path=[path] - default is $SPLUNK_HOME/var/spool/splunk<br />
#       index=[indexname] - which index to target for results. If blank will use whatever is bundled<br />
#       marker=[string] - this is just a token or k=v used to mark the results for version or other delination or to defeat crc caching<br />
#       format=["csv"|"splunk"] - use the output format &#8220;splunk&#8221; for feedking back into splunk or csv if you want to save for other tool<br />
#       appendtime - if true this will append current time. Its useful when you are doing something that you want with a timestamp of now<br />
#       notimestampe - is this arg is supplied it will suppress the timestamp in the filename<br />
#       debug - if debug then will just out args to screen</p>
<p># following example will put in var/spool/splunk a file named foo without timestamp, marked with erik=nextrun&#8221;, and targed to index cache<br />
# index::_internal &#8220;group=pipeline&#8221; | timechart avg(executes) | cacher file=&#8221;foo&#8221; notimestamp marker=&#8221;erik=nextrun&#8221; index=&#8221;cache&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.splunk.com/erik/2007/11/18/making-reports-faster-through-saved-searches/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
