erik: Homepage

Search engine for virtual sprawl - vmware app for splunk

**** UPDATE - 10/31/08 ****
Hey all,
I’ve updated the app to version 1.8.
The only fix in this version is a bug with multiple datacenters.
Version 1.8 should now work for an unlimited number of datacetners.
( Thanks to Stephen for finding and letting me know )

As always feel free to bug me if the app has any problems.
e.

**** UPDATE - 10/10/08 ****

Hey all,
I updated the latest release - 1.7 - to fix a shutdown bug.
Turns out that in prior releases when Splunk was shut down that the VMWare app kept running.
This release not will terminate the VMWare app when splunkd goes away.

If you would like to test or run without splunk you can pass in the arg.
java -jar splunk.jar –standalone

** see instructions below on how to run the above command **
As usual, drop me a line if you have any questions.
Good luck with 1.7

**** UPDATE - 09/16/08 ****

Thanks to more testing i have found and fixed a few critical bugs.
Updated APP version 1.6 >> here <<

My favorite “customer” and Splunk as multi-tenant platform

Everyone has their favorite customer.
I have one too and he is the CTO of a very cool IVR/VoIP platform. His name is RJ Auburn
rj

Around here is synonomys with filing 34 bugs between sunday 9PM when we push bits to the site and 9AM when we get in to the office. I dont mean the usual the UI-is-off-by-10-pixels but complex indexing or distributed search bugs. Well, sometimes is its a trivial thing we missed, but usually he is usually pushing splunk to its limits. Its not often that a CTO and “industry expert” is the one to personally put splunk through its paces - but it’s RJ is like that and gets his hands dirty - and splunk is the better for it.

RJ and Voxeo are one of a few, but quickly growing, number of companies that are using splunk in a multi-tenant environment. This means using splunk to to collect data across multiple tenants in a hosted environment and then using splunk for searching and reporting on a per customer basis. Often the output of the searches/reports is rendered for the customer do they can see what is going on within the service. Customer dashboards and activity reports are a common usecase for splunk. Below are some of the images from the voxeo service:

vox dash

Congrats to FlowingData - strength in (subscriber) numbers!

We here at splunk are into processing lots of data. Our external marketing focuses mostly on hardcore IT data but internally we play with all sorts of data sets : government stats, sports stats, even music as shown by Brian cool post.

I just wanted to congratulate Nathan over at FlowingData for crossing the 3100 subscriber mark.

flowingdata logo

FlowingData is a fantastic example of the hidden value in the data all around us. As more and more of what we do is documented by computers the impact of statistics has become less of a hard-core math geek sport and more within the reach of anyone’s curiosity. His daily posts are a constant reminder of how statistics has become a crossover genre.

Thank you Nathan!
e

Splunk for Virtualization

I’m looking for some help.
I’ve built a VMWare app for splunk and in the process of doing the same for Xen. These Apps use the VMWare and Xensource API’s to index everything about the VM environment. When combined with splunk instances running within the guest OS you get a very comprehensive historical picture. I’m curious are there any splunk customers out there using VMWare or Xen? I’m looking for usecases so that i better understand how to configure the apps. I’d be curious to know what types of information would be useful to capture and what types of searches would one want to perform. Both Xen and VMWare have so much data available that configuration could be complicated. I’m trying to narrow it down to several useful out of the box configurations. If your have any thoughts comment here or email me at erik at splunk dot com.

Thanks
e.

Making reports faster by caching scheduled searches

I find this hard to explain even though its an extremely simple concept. It would be nice to get some feedback since I think we want to productize the idea but we are not clear on what makes sense.

If I have a search/report that I want to run faster, I will save that search and have splunk run it over a small timeframe (5,15,30,60 min) taking the results of that search/report and feeding them back into an index i create to hold cached results.

For example, suppose I like to run nightly reports where I show “top users by bandwidth”. Its easy enough to run the report every night, but suppose there are times during the day when I want incrementals, or I want to look at last week, or perhaps get dailies over a month. Every time I run the search/report I need to search and recalculate “top users by bandwidth”, which if over billions of events can take time ;-)

Instead, I’ll just save the search/report and have Splunk run it every 15 minutes with the results being sent to a “cache” index. This way if I ever want to do an adhoc search on “top users” or if I want to do “weekly reports by day” all the data is precalculated.

The Feature Magpie Phenomenon

Having lived through the software-as-building-architecture argument every few years i am accustomed to thinking of (refuting) how software and software development is (not) like the traditional field of building design and development. The analogy driven mind needs something to reference and i guess us noobs in software are desperate to find something historical to feel validated.
Every discipline needs a role model and building design seems to be our adopted hero.

This post proposes an analogy that is far less intellectual than a typical comparison between Christoper Alexander and the design of an EventLoop Abstract Factory class.
My analogy here is based more on THIS weekend with MY wife.

Our house is full of stuff.
This stuff; chairs, tables, artwork, rugs,… we have acquired for good reason - and we could use it. The problem is that despite all good intention these hand-me-downs, gifts, rash purchases, sometimes just don’t work.

Yes we need a coffee table in the front room - but not THAT one.
Maybe the design is wrong.
Maybe the size is wrong.
Maybe the idea of a coffee table that is also is a fireplace seemed like a good idea at the time but WTF.
coffeetable

Reliable syslog/tcp input - splunk bundle style

Wanted to drop this someplace for feedback.
Splunk is often hooked up to syslog(ng) or tcp ports.
Customers then shoot data as fast as they can at splunk.

You can have splunk buffer inputs or have the sender buffer but in many cases this is less than optimal - Its usually not a good idea to rely on sender side buffering.

As an interesting alternative you can use a splunk bundle to catch data off the network port and spew it to a file(s) and have splunk tail those files at its leisure. If splunk can keep up it will be seconds before you can search it. If you get a huge burst, no problem the bundle will just go to disk and splunk we be right behind. Furthermore, if someone wanted to restart splunk ( or splunk were to crash - yes it happens ) then again, just going to disk.

Beer Pong @ Splunk

Come friday at 5PM - the table came out and it was time for Beer Pong.
Myself, i had not heard of Beer Pong until Nick Mealy (in picture below on right) explained.
He has an annual pilgrimage for a week to play and pointed out that there are acutal leagues.

Splunk is all about the proper Beer Pong - with paddles - not Beruit sytle.
I’m not really up on the details but its goes something like - you place cups of beer ( see our double tap keg in background of picture on far left ) on the table and your opponents try to hit the ball into the cup forcing you to drink. I think these are the rules along the lines played at splunk.

Unfortunately, i only had iphone to take picture. Next time i’ll get a movie.

** Friday Beer Pong @ Splunk **

None of this could happen without our Beer Man from Mikes Liquors. Must be SF’s best - we call in with an order and hours later our Man ( see blow ) shows up in his Beer Guy jacket to rack the booze.

$SPLUNK_HOME

For the first few years it was in garages and basements. Then we graduated to squatting with friends ( thank you sixaprt, boulder ventures, and sevin rosen ). Finally we scored our own space in SOMA - just across from the PacBell/ATT/Verizon/TMobile/Comcast Park.

… 4th and 5th floor in the taller of the buildings …
street

Why SF?
Some of us live in the north bay to Santa Rosa and beyond.
Some of us live out in the east bay out to Walnut Creek and beyond.
And of course some of us folks live down in Cupertino, MtV, and Sunnyvale.

Our space is nicer than we deserve - bad omen or not - etrade bought the building during the height of the boom and decked it out with $17M in TI’s.
Then before they could move in they got adjusted - along with most of us.

… can’t really tell here but its a nice space …
inside
Above pic is our patch on the 4 where we keep it dark.
4th floor is all dev, no lights, lots of coffee, lots of booze (need to post pic of liquor cabinet and kegs), foosball all the time, wii all the time etc, bad jokes, etc. all serious productivity enhancers.

Splunking the most abundant time based dataset on the planet

What is it the most abundant time-based data set that *everyone* works with?

It ain’t logs - Its email.

if you think about it, email messages are a bit “event like” - they have timestamp, somewhat structured header, and payload.

Since splunk was designed for time based datasets it’s only natural that we hook it up to email. I’m not suggesting that you use splunk as your mail reader ( although i’m working on a few actions for forward, reply, etc ) but that in a datacenter, email often carries critical workflow information.

In our own infrastructure we have systems generating email notifications for things like support cases, changes to source code, open bugs, etc. Its interesting to bring the mail into the mix with my logs, config changes, etc. Once my mail is indexed I can instantly report on frequency of customer issues (support case email), changes in source code by file/user (perforce checkin/diff email), coded bugs by user per week (Jira bug notification emails), or just report on my own inbox - messages by size by time/sender/etc.