Swimming with Dolphins
Last week, Splunk posted its MySQL Connector on Splunkbase. The release may have escaped your notice, so I’ll describe what we’ve shipped, and why this is so significant. In fact, by the end of this blog, you’ll realize that to call it a “connector” is a modest understatement of what it actually is.
To start, there’s been a natural tension between the SQL and NoSQL worlds. Some have gone so far as to proclaim the end of SQL, which is a bit premature. SQL is still a key data technology, and will continue to be one. However, the world of data technologies is expanding. Rather than replacing existing incumbents, technologies are more and more diversifying, opening up possibilities that in the past were constrained by rigid paradigms. Going forward, it’s going to be a mixed world.
Splunk, at the core, is NoSQL, and like other NoSQL technologies, we also acknowledge that SQL plays a significant part of the total data picture for an enterprise.
There’s two reasons for providing the MySQL connector:
- Customers have been asking for it, and
- Splunk’s inherent architecture lends itself to extension
Meeting Customers’ Demands
Ever since we came out with the search lookup feature a couple of years ago, customers have been using it to join Splunk and tabular data. However, the tabular data was in CSV files, and soon customers were asking for this data to be in an RDBMS.
Now, with the MySQL Connector, a customer can store their tables in MySQL tables rather than flat files. This allows for central management of data, that is both transactional and easily manageable. So, for instance if you have user IDs in your log data, but user names in MySQL (or similar for product IDs, zip code to city/state mappings, etc.), then you can join that with search results against the log data for complete, human-readable reports. If you have Splunk with several indexer nodes, and search heads, no need to replicate the CSV data across the nodes; it’s all managed in MySQL. This is particularly valuable when working with large data sets.
Why MySQL? There’s nothing special about MySQL, other than it’s pervasive, and easy to setup. Technology-wise, inherent in Splunk’s architecture is extensibility. Integrating with other databases is just as simple.
Lookups and Mashups
We touched a bit on the genesis of the MySQL Connector for Splunk lookups, which can be described as a join operation with data from both Splunk and MySQL performed within Splunk. The command being issued to MySQL is a SELECT underneath the covers. The cool thing about it however is that we’ve gone beyond just Splunk lookups. We’ve surfaced SQL itself so that you can execute it from Splunk search. Splunk becomes a full SQL client, so you don’t just have SELECTs, but also INSERTs and UPDATEs being readily available, and since the metadata is available to us, you can do things like browse tables, and other attributes of the database.
This makes managing lookup tables even easier, since for instance, one problem with lookup tables in CSV is that you need to manage this data outside of a database. With the Connector, you can do updates on this table, and due to the extensible nature of the search language, you can stream the output of Splunk search to populate those tables.
With the connector, you can actually mashup Search and SQL in the same invocations by using the mysqlquery command. (which is implemented as a custom search command):
... | mysqlquery spec=MyDb query="SELECT * FROM MyTable;"
For more info on how you can utilize SQL within Splunk Search, see: http://docs.splunk.com/Documentation/MySQL/latest/User/QueryMySQLdatabasetableswithmysqlquery
To re-emphasize, this is not just a feature for better lookups, it opens up MySQL fully to Splunk operations of every kind. As an example, what if you wanted Splunk alerts to result in a ticket being opened to kick off a resolution workflow? One could imagine populating an “inbox” of requests with data from the alert, including all the data acquired from Splunk’ed logs.
Splunk’s Extensible Data Platform
If I were to list out Splunk under utilized capabilities, top of the list would be custom search commands. In short, Splunk’s search language is pipelined, and therefore, operations are sequenced, and a single “search” invocation can involve several operations chained together. The flexibility of this architecture lends itself to user extension, since the input of one command is an output of another. (for more details, check out the highly detailed docs here: http://docs.splunk.com/Documentation/Splunk/latest/developer/searchscripts) This makes it simple to integrate any data source with a remote API into the search pipeline, enabling a form of data integration that is simple yet powerful.
The MySQL connector is built entirely using standard Splunk extensions. In actuality, any one could have built it, and the hope is that many more integrations will come in the future.
As the adage goes, every journey starts with a first step. And that’s what this is, a journey. There are a plethora of data sources new and old that can be combined, and it’s fascinating to think of the potential here, and what lies in the months ahead as we go down this path.