Using Splunk Archive Bucket Reader with Pig

Update 9/27/16: As of Sept. 27, 2016, Hunk functionality has been incorporated into the Splunk Analytics for Hadoop Add-On and Splunk Enterprise versions 6.5 and later.

This is part II in a series of posts about how to use the Splunk Archive Bucket Reader. For information about installing the app and using it to obtain jar files, please see the first post in this series.

In this post I want to show how to use Pig to read archived Splunk data. Unlike Hive, Pig cannot be directly configured to use InputFormat classes. However, Pig provides a Java interface—LoadFunc—that makes it reasonably easy to use an arbitrary InputFormat with just a small amount of Java code. A LoadFunc is provided …

» Continue reading

Splunk Archive Bucket Reader and Hive

Update 9/27/16: As of Sept. 27, 2016, Hunk functionality has been incorporated into the Splunk Analytics for Hadoop Add-On and Splunk Enterprise versions 6.5 and later.

This year was my first .conf, and it was an amazingly fun experience! During the keynote, we announced a number of new Hunk features, one of which was the Splunk Archive Bucket Reader. This tool allows you to read Splunk raw data journal files using any Hadoop application that allows the user to configure which InputFormat implementation is used. In particular, if you are using Hunk archiving to copy your indexes onto HDFS, you can now query and analyze the archived data from those indexes using whatever your organization’s favorite Hadoop applications are (e.g. …

» Continue reading