Serendipity is….

“Serendipity is looking in a haystack for a needle and discovering a farmer’s daughter
- Julius Comroe

I just read the quote in a presentation from Matt Jones of BERG at the DXf conference. There is so much i love about this presentation i don’t know where to start. Just click through it ( embedded below ) and have your own reaction. It’s clearly designed to be a fun/light read. I think I clicked at about one slide per few seconds. Then went back and stopped on a few that really spoke to me. It was entertainment that made me think which then made me smile.

At its heart, splunk is a time machine. It allows someone to go back in time and “see” what their world looked like at any given moment and to look for trends, anomalies, volume, momentum, etc. If you put enough data into splunk, you can re-live the past with microscope, hit “play”, slow it down, speed it up, draw it on a chart, compare it to another time. We are all about time.

Add a Server or Two!

Every week i run into someone that is having performance issues and they are not aware you can just add another server or two or ten. I’ll travel to meet a company and I’ll ask how many servers they are using for Splunk to search/index/report on a terabyte a day. They will say a couple. I’ll then ask how many they have for a similar sized hadoop or data warehouse project. They will say 50 to 100X that number. Look if your going to give these systems 300+ servers, can we please get 15?

Somehow there is a breakdown in our communication that we scale like all other good architectures.

The following are hopefully some easy pictures to help tell the story. It should be extremely simple and straight forward, to the point of being obvious - if not bug me and i’ll try again.

Exponential is the entrepreneurs linear

I was in a meeting last thursday where some “important-people” ( not sure if they want to be named ) dropped the D word ( “disruptive” ) several times. They were presenting a slide that proved-out an age-old (1994?) adage that the key to success is ( can be ) a disruptive business model. It’s one thing for professor Christensen to talk about it, and another when its bankers have a slide for it. Personally I need to be reminded of its importance every day, since being disruptive was one of the most important guiding principals when founding Splunk. As we grow, and become more established, i hope we continue to be a disruptive leader - it certainally faces constant tension.

Hearing the D-word reminded me i wanted to post about Steve Jurvetson recent video of a talk at Stanford. I can’t tell if the talk was great, or it just spoke to me as i just needed a entrepreneurial boost. Also, I find myself very much into nano/bio engineering these days as it seems the next wave of innovation and DFJ is clearly backing some of the most innovative work. The great thing about Splunk is trying to make heads-or-tails of very large data and the more time I spend with it, the bio/nano space the next frontier.

Collision of big data analytics and splunk

beerHow people use Splunk is often a surprise to us - at least they are going beyond our original intent. Initially we thought of splunk as a search engine for log files, Google for your logs if you will, to help IT folks troubleshoot their complex systems. Quickly we found that users started Splunking config files, network packets, source code, email, etc. Over the years our customers have been dragging us into all sorts of new uses-cases like global windmill power plant data analysis, protein structure prediction, or just something simple like analyzing user behavior on a website.

Lately we have started to see the collision of Splunk and big data analytics, usually with hadoop based tools, vertica, aster, greenplum, etc. In most cases there is complimentary value with these guys as they are better at some things than splunk, but there are use-cases where splunk by itself is just fine. Either way, Splunk is getting dragged into the big data area since we often are the collectors and often the primary indexer of long term historical data.

It was interesting to see Curt Monash, veteran database analyst and guru, post about splunk. If was a very short introduction to Splunk, but our appearance on his list signals our entry into a larger big data discussion.

The Puppet Master Cometh

beer
Last week Luke Kaines, The Master of Puppet, held a very well attended Puppet Camp here in SF. He drew a fantastic attendance from top notch companies - I was most impressed with the technical quality of the presentations and breakout sessions ( quality food too! ). These types of events can often be mundane or boring - this was not. Kudos to Luke for building a quality community.

I had the pleasure of meeting Luke some three years ago back at a BayLISA event where I saw him win over a tough audience with an early incarnation of Puppet. Its been fun watching him over the years deliver on that early promise and for continuing to win over a very tough crowd.

Recently I’ve been polling our customers how they do configuration/change management. Interestingly, I have noticed people mostly fall into two camps:

  • A very large percentage that use Puppet
  • A equally large percentage use nothing or home grown

It caught me off guard that such a large number use Puppet and equally surprised that there was no #2 vendor solution. Great news for Luke and team.

(I’m Back!) The return of Splunk Free, as in Free Beer

*** Update 10/26/09 ***
Free is Back!!

Well it never really went away, but not its easy to run the free version of splunk.

Downloads still contain an enterprise 60 day license, but you can covert to the free product at time you like and use it like a champion.

beer
Back several months, before the launch of 4.0, we were confronting at all the work ahead. As always, we had to make hard decisions about what is in and what is out. In 4.0 we had re-implemented much of the UI and a good chunks of the backend. With over 1000 paying customers and looking at a potentially challenging upgrade process and a huge testing task we needed to reduce risk to the schedule and product quality. It was a hard decision but we reduced the GA risk by pulling out the Free product until we GA’d and fixed most of the critical bugs. Our guess was that it would take 45-90 beyond the GA to get few maintenance releases out before we could test the free product.

Search engine for virtual sprawl - vmware app for splunk

**** UPDATE - 10/31/08 ****
Hey all,
I’ve updated the app to version 1.8.
The only fix in this version is a bug with multiple datacenters.
Version 1.8 should now work for an unlimited number of datacetners.
( Thanks to Stephen for finding and letting me know )

As always feel free to bug me if the app has any problems.
e.

**** UPDATE - 10/10/08 ****

Hey all,
I updated the latest release - 1.7 - to fix a shutdown bug.
Turns out that in prior releases when Splunk was shut down that the VMWare app kept running.
This release not will terminate the VMWare app when splunkd goes away.

If you would like to test or run without splunk you can pass in the arg.
java -jar splunk.jar –standalone

** see instructions below on how to run the above command **
As usual, drop me a line if you have any questions.
Good luck with 1.7

**** UPDATE - 09/16/08 ****

Thanks to more testing i have found and fixed a few critical bugs.
Updated APP version 1.6 >> here <<

My favorite “customer” and Splunk as multi-tenant platform

Everyone has their favorite customer.
I have one too and he is the CTO of a very cool IVR/VoIP platform. His name is RJ Auburn
rj

Around here is synonomys with filing 34 bugs between sunday 9PM when we push bits to the site and 9AM when we get in to the office. I dont mean the usual the UI-is-off-by-10-pixels but complex indexing or distributed search bugs. Well, sometimes is its a trivial thing we missed, but usually he is usually pushing splunk to its limits. Its not often that a CTO and “industry expert” is the one to personally put splunk through its paces - but it’s RJ is like that and gets his hands dirty - and splunk is the better for it.

RJ and Voxeo are one of a few, but quickly growing, number of companies that are using splunk in a multi-tenant environment. This means using splunk to to collect data across multiple tenants in a hosted environment and then using splunk for searching and reporting on a per customer basis. Often the output of the searches/reports is rendered for the customer do they can see what is going on within the service. Customer dashboards and activity reports are a common usecase for splunk. Below are some of the images from the voxeo service:

vox dash

Congrats to FlowingData - strength in (subscriber) numbers!

We here at splunk are into processing lots of data. Our external marketing focuses mostly on hardcore IT data but internally we play with all sorts of data sets : government stats, sports stats, even music as shown by Brian cool post.

I just wanted to congratulate Nathan over at FlowingData for crossing the 3100 subscriber mark.

flowingdata logo

FlowingData is a fantastic example of the hidden value in the data all around us. As more and more of what we do is documented by computers the impact of statistics has become less of a hard-core math geek sport and more within the reach of anyone’s curiosity. His daily posts are a constant reminder of how statistics has become a crossover genre.

Thank you Nathan!
e

Splunk for Virtualization

I’m looking for some help.
I’ve built a VMWare app for splunk and in the process of doing the same for Xen. These Apps use the VMWare and Xensource API’s to index everything about the VM environment. When combined with splunk instances running within the guest OS you get a very comprehensive historical picture. I’m curious are there any splunk customers out there using VMWare or Xen? I’m looking for usecases so that i better understand how to configure the apps. I’d be curious to know what types of information would be useful to capture and what types of searches would one want to perform. Both Xen and VMWare have so much data available that configuration could be complicated. I’m trying to narrow it down to several useful out of the box configurations. If your have any thoughts comment here or email me at erik at splunk dot com.

Thanks
e.

Next Page »