Collision of big data analytics and splunk

beerHow people use Splunk is often a surprise to us – at least they are going beyond our original intent. Initially we thought of splunk as a search engine for log files, Google for your logs if you will, to help IT folks troubleshoot their complex systems. Quickly we found that users started Splunking config files, network packets, source code, email, etc. Over the years our customers have been dragging us into all sorts of new uses-cases like global windmill power plant data analysis, protein structure prediction, or just something simple like analyzing user behavior on a website.

Lately we have started to see the collision of Splunk and Big Data analytics, usually with hadoop based tools, vertica, aster, greenplum, etc. In most cases there is complimentary value with these guys as they are better at some things than splunk, but there are use-cases where splunk by itself is just fine. Either way, Splunk is getting dragged into the big data area since we often are the collectors and often the primary indexer of long term historical data.

It was interesting to see Curt Monash, veteran database analyst and guru, post about splunk. If was a very short introduction to Splunk, but our appearance on his list signals our entry into a larger big data discussion.

Many of our larger customers have Splunk for troubleshooting, monitoring and real-time alerting, and have other tools such as vertica, aster or others for doing analytics. Interestingly, we are starting to solve how to play well together. Both systems often require the same data, and with splunk often collecting at the source we are starting to see places where splunk feeds these systems. I think it will be fun over the next year or so to see how the hadoop movement, columnar store, parallel sql, and other technologies evolve along with Splunk. If you have one of these other systems and are curious how to better leverage splunk along side drop me a line. And do check out Curt’s blog for keeping up on whats happening in the next gen database space.

Funny, I’d have thought it would be the other way around?
When you have large volumes of data, filtering, processing and perhaps indexing makes more sense in Hadoop. Data can the be fed into Splunk for presentational services.

Given the business model for Splunk, the licensing costs dont make it the sensible choice for doing what can be done with Hadoop. The collection and aggregation can be done using SyslogNG.

david
October 6, 2010

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*