erik: dev

Search engine for virtual sprawl - vmware app for splunk

**** UPDATE - 10/31/08 ****
Hey all,
I’ve updated the app to version 1.8.
The only fix in this version is a bug with multiple datacenters.
Version 1.8 should now work for an unlimited number of datacetners.
( Thanks to Stephen for finding and letting me know )

As always feel free to bug me if the app has any problems.
e.

**** UPDATE - 10/10/08 ****

Hey all,
I updated the latest release - 1.7 - to fix a shutdown bug.
Turns out that in prior releases when Splunk was shut down that the VMWare app kept running.
This release not will terminate the VMWare app when splunkd goes away.

If you would like to test or run without splunk you can pass in the arg.
java -jar splunk.jar –standalone

** see instructions below on how to run the above command **
As usual, drop me a line if you have any questions.
Good luck with 1.7

**** UPDATE - 09/16/08 ****

Thanks to more testing i have found and fixed a few critical bugs.
Updated APP version 1.6 >> here <<

My favorite “customer” and Splunk as multi-tenant platform

Everyone has their favorite customer.
I have one too and he is the CTO of a very cool IVR/VoIP platform. His name is RJ Auburn
rj

Around here is synonomys with filing 34 bugs between sunday 9PM when we push bits to the site and 9AM when we get in to the office. I dont mean the usual the UI-is-off-by-10-pixels but complex indexing or distributed search bugs. Well, sometimes is its a trivial thing we missed, but usually he is usually pushing splunk to its limits. Its not often that a CTO and “industry expert” is the one to personally put splunk through its paces - but it’s RJ is like that and gets his hands dirty - and splunk is the better for it.

RJ and Voxeo are one of a few, but quickly growing, number of companies that are using splunk in a multi-tenant environment. This means using splunk to to collect data across multiple tenants in a hosted environment and then using splunk for searching and reporting on a per customer basis. Often the output of the searches/reports is rendered for the customer do they can see what is going on within the service. Customer dashboards and activity reports are a common usecase for splunk. Below are some of the images from the voxeo service:

vox dash

Splunk for Virtualization

I’m looking for some help.
I’ve built a VMWare app for splunk and in the process of doing the same for Xen. These Apps use the VMWare and Xensource API’s to index everything about the VM environment. When combined with splunk instances running within the guest OS you get a very comprehensive historical picture. I’m curious are there any splunk customers out there using VMWare or Xen? I’m looking for usecases so that i better understand how to configure the apps. I’d be curious to know what types of information would be useful to capture and what types of searches would one want to perform. Both Xen and VMWare have so much data available that configuration could be complicated. I’m trying to narrow it down to several useful out of the box configurations. If your have any thoughts comment here or email me at erik at splunk dot com.

Thanks
e.

Performance impact of fast drives (via sorkin)

The following is copped from a support email by Stephen Sorkin who is the man behind the splunk server curtain … thought it should go broader.

I’m the manager of the search and indexing team at Splunk. We’re still in the process of writing up our findings from storage benchmarks but here are the general details.

High IO/s typically means both faster indexing in general and faster searching of rare, temporally incoherent events. On average, we’ve seen indexing speeds increase by about 66% going from an 7200 RPM SATA RAID to a 15K RPM SCSI RAID. We’ve seen comparable performance from SCSI and SAS RAIDs, provided they’re 15K RPM.

The best best benchmarking tool we’ve found for measuring how Splunk will behave on your disk hardware is bonnie++. If your disk subsystem can sustain 800 IO/s, you’re in good shape.

Its about time - Preview #3

hex
Hey all,

It’s taken longer than we would have liked but our 3rd preview build has been posted.
Get’um here

A bunch of work has gone into windows stability, tons of bugs were fixed, and a bunch of customer requests have been implemented ( we will let you know out of band ). We expect that this release should be more stable, slightly faster, and less buggy.

Left to do, we still have a bunch of IE work, performance improvements, and cleaning up of some features like interactive field extraction and event type discovery.

Its still not production ready so don’t even think of trying it out for real - and there is no guarantee that migration will work from a preview to GA ( we will migrate from 3.1.x to GA but not preview ). Also, don’t run splunk as root - its just not good to do until we run through all our testing.

As always, please send us feedback at splunkpreview@splunk.com or hit us up on IRC (irc.efnet.org #splunk).
The last round of info from Preview #2 was awesome please keep it up!

e.

Just in time for new year - its Preview #2

Happy new year (bit early) all dev.splunk.com readers….
We have just posted our second 3.2 preview release. (build number 30455)

Its packed with holiday goodness, albeit very raw.

First you will notice we have posted a windows build. Its been in the cooker since last Feb and thanks to Mitch, Ledio, Igor and a bit of Amrit we now have a single code base that rocks on linux, mac, solaris, freebsd, aix, AND windows. This was not an easy feat as evidenced by our gift of a pony (soft and electronic) to Mitch for his effort. Its still very raw (the build not the pony), and has a tendency to crash because of a memory fragentation and limited vm space. Which will be fixed by GA… MarkB. will post more on the build so stay tuned for details. Its a big deal for us so be patient and we sure could use feedback on how to make it the best it can be.

Also in this release you will see the UI starts to get some of the async search results. Over the next few releases we will be moving to fully async search in the UI. It will take a few turns but this preview has some of the first cut.

Preivew #1 is up

Splunk fans.

We have posted the our first of many preview releases. You can find them here:
Our hope is that every week or two as new features or API’s become usable that we post builds soliciting feedback.
This first post has a bunch of backend and UI performance improvements as well as some new but hidden features:

  • live searching of data
  • flexible roles
  • scripted authentication
  • event decoration ( for the xmas season )
  • auditing of splunk server actions
  • file system change detection
  • improved (proper) sub second support
  • transaction search
  • new experimental simple search interface
  • “where” support in search clause ( you dont need to use the “| where” anymore and can just search for foo=10 )

I’m not going to explain here what these things mean or how to find them or use them ;-)
Instead the product managers and developers will post here with ideas on what to try and what feedback we are looking for.

I’d like to thank in advance those brave few of you that have the few minutes to install these builds and give us your feedback.

e.

Making reports faster by caching scheduled searches

I find this hard to explain even though its an extremely simple concept. It would be nice to get some feedback since I think we want to productize the idea but we are not clear on what makes sense.

If I have a search/report that I want to run faster, I will save that search and have splunk run it over a small timeframe (5,15,30,60 min) taking the results of that search/report and feeding them back into an index i create to hold cached results.

For example, suppose I like to run nightly reports where I show “top users by bandwidth”. Its easy enough to run the report every night, but suppose there are times during the day when I want incrementals, or I want to look at last week, or perhaps get dailies over a month. Every time I run the search/report I need to search and recalculate “top users by bandwidth”, which if over billions of events can take time ;-)

Instead, I’ll just save the search/report and have Splunk run it every 15 minutes with the results being sent to a “cache” index. This way if I ever want to do an adhoc search on “top users” or if I want to do “weekly reports by day” all the data is precalculated.

Dont forget to index your config files!

Dont forget to index your config files!

Why?

Because splunk is a great way to track changes and see differences in your configs.
For most troubleshooting and compliance situations having a historical recored of all your configurations just goes hand in hand with the log data. They are two sides of the same coin.

The cool thing is that it takes just a few seconds to get up and running. If you have splunk installed its all but free to index your configs - they are small in size compared to log files. Even if you indexed all configs in a 2000 machine deployment it would not come close to the volume of even a small size proxy log.

30 second refresher:
Just tail /etc you will capture most of the interesting configs on your box.

from the cli:
> splunk add tail /etc

or in UI just add a tail to /etc

Thats it. That is all you need to do.

** note ** you should grab 3.1 ( http://download.splunk.com ) as there were some bugs in 3.0’s config processing.

Jobs @ splunk

A standard preamble about life at splunk.

We are always looking for passionate and intelligent engineers regardless of their background, musical preference, grades they got in collage, or ability to prove some esoteric math theorem. We react best to people that are creative and think for themselves - people that are smarter than shit but don’t over think the wrong thing.

Much of our companies identity, the product, its features, how its used, how we talk about the product, our branding, all mainly come from the development staff - and we plan to keep it that way. Our philosophy is based on the idea that 10 diverse smart people in a room are better off than 25 decent yet uninspired engineers - so although we need to grow fast we end up going slow because we are picky.

Couple of other data-points:

  • we like to have fun while at work
  • we have smart people, challenging problems, and an interesting architecture
  • we practice an liberal interpretation of scrum
  • our code is cross platform and *resembles* open source development models
  • we like nice hardware ( cpu, monitor, etc ) and dont care what OS you use