SQL + Splunk = SplunkMSE

Introducing SplunkMSE (Splunk MySQL Storage Engine).

SQL is the lingua franca of structured data.  Likewise, Splunk is the way to work with highly unstructured data generated in the data center.  Data residing in relational databases can be analyzed via a plethora of off the shelf tools like Excel, Tableau, Cognos, Crystal Reports and on and on.   SQL is well known by developers everywhere. What better idea than using these tools to work with data that lives within Splunk?

SplunkMSE is fully open source. Visit SplunkMSE’s home site  for downloads, installation instructions, detailed documentation, source code and more. While there, I encourage you to ask questions, file bugs and if the overwhelming urge to fix them should arise, feel free …

» Continue reading

The SSL Performance Odyssey

When you come to dev.splunk.com, you see pictures of beer pong, full bars, stuffed ponies with fart machines taped to their ass, etc – basically engineers gone wild. Somewhere between all of this insaneness, we actually find the time to write code and solve problems like this one.This post is all about a crazy-weird performance issue that we were experiencing, how it manifested itself and ultimately how it was fixed.
I suspect others may be having this problem, as the problem lives in some very popular open source code as far as I can tell. With that, I’ll begin telling you about my journey into hell.

Splunk has a home grown embedded HTTP(S) server that serves up all external interfaces …

» Continue reading

Diagraming Splunk’s data-flow (part 2 – performance overlays)

In my previous post “Diagraming Splunk’s data-flow” I wrote a small python script that parsed Splunk’s runtime environment ($SPLUNK_HOME/var/run/splunk/composite.xml) and generated a file which when input into graphviz would generate a nice architectural diagram of how pipelines and processors are wired together.

In this installment, I took it to the next level by using Splunk’s search capability to overlay performance metrics on the diagram. The combination of Splunk logging metrics information for each processor within each pipeline (thanks Brad) and the ability to have Splunk execute a search processor written in Python made this possible. Here is how you use it:

First download graphviz. I particularly like the OSX application that they’ve written because you can see the graph …

» Continue reading

Diagraming Splunk’s data-flow

This blog entry is not about how the framework works. It is about a semi-cool visualization that I created using python and graphviz. If you watched the video where I presented Splunks framework architecture from a high level you know what pipelines and processors are. If you haven’t here is a very quick overview.

  • A pipeline is a thread of execution that lives within the splunkd process. Each pipeline executes a series of processors, each one which operates on data. The data is created when the first processor on the pipeline reads it from some input (like tailing a file, or receiving it on a network port). Each processor then does something to the data. Eventually, the data
» Continue reading

Software configuration – why does this wheel need re-invention?

I have worked on so many software projects that I can’t possibly enumerate them. Most of my contribution to these projects has been on the server side of things. Every one of these projects needed to be configured in some way, shape or form and I just realized that every one of them had it’s own configuration subsystem that was implemented from scratch. Many of these configurations could be managed via GUI’s and/or CLI’s, and others simply were “managed” via vi, or emacs. They all share one thing in common however – they all suck in one way or another. Why? Because configuration subsystems are incredibly difficult to get right.

Building a configuration system on the surface seems boring. If …

» Continue reading