Caching Hadoop Data with Splunk and Hunk

Update 9/27/16: As of Sept. 27, 2016, Hunk functionality has been incorporated into the Splunk Analytics for Hadoop Add-On and Splunk Enterprise versions 6.5 and later.

Although Hadoop is good at processing a large amount of data, it is not the fastest platform. Below are a list of options that Splunk and Hunk can offer to speed up the retrieval of results and lower the processing overhead of Hadoop.

Each option has its own advantages:

Screen Shot 2015-05-05 at 11.54.16 AM


1) Hunk Report Acceleration

This option caches the results in HDFS and keeps it fresh and current.  By default, Hunk will check for new Hadoop data every 10 minutes.

Details =


2) Hunk Scheduled Searches

This option caches the results on the …

» Continue reading

Hunk Setup using Hortonworks Hadoop Sandbox

Update 9/27/16: As of Sept. 27, 2016, Hunk functionality has been incorporated into the Splunk Analytics for Hadoop Add-On and Splunk Enterprise versions 6.5 and later.

Hortonworks Sandbox is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop examples. Recently Hortonworks and Splunk released a tutorial and video to install and connect Hunk with the Hortonworks Hadoop Sandbox version 1.3

This blog summarizes the configurations used as part of the Hunk setup.

Configurations for Hadoop Provider:

Key Value
Java Home /usr/jdk/jdk1.6.0_31
Hadoop Home /usr/lib/hadoop
Hadoop Version Hadoop version 1.x, (MR1)
Job Tracker sandbox:50300
File System hdfs://sandbox:8020
Splunk search recordreader,


Configurations for Hadoop Virtual Indexes:

Key Value
Name hadoop_sports
Path to data in HDFS /user/hue/raanan/…
» Continue reading

Splunk Hadoop Connect 1.1 – Opening the door to MapR; now available on all Hadoop distributions

I am happy to announce that Splunk Hadoop Connect 1.1 is now available. This version of Hadoop Connect rounds out Splunk’s integration with the Hadoop distributions by becoming certified on MapR. Cloudera, Hortonworks, and Apache Hadoop distributions also have the ability to benefit from the power of Splunk.

Splunk Hadoop Connect provides bi-directional integration to easily and reliably move data between Splunk and Hadoop. It provides Hadoop users the ability to gain real-time analysis, visualization and role based access control for a stream of machine-generated data. It delivers three core capacities: Export data from Splunk to Hadoop, Explore Hadoop directories and Import data from Hadoop to Splunk.

The most significant new feature added to version 1.1 is the …

» Continue reading

Hadoop and Splunk Use cases

Customer Examples – Using both Splunk and Hadoop

The Splunk and Hadoop communities can benefit from each other’s strengths. Below are several examples of customers that use both environments.

Use Case Description
1 – Splunk then Hadoop Splunk collects, visualizes, and analyzes the data and passes it to Hadoop for ETL and other batch processing
2 – Hadoop then Splunk Hadoop Collects the Data, and passes the results to Splunk for Visualization
3 – Data flows in both directions Splunk and Hadoop collect different artifacts and share the data that Hadoop needs for ETL or batch analytics and Splunk needs for real-time analysis and visualization
4 – Side-by-Side Both Splunk and Hadoop are used by the organization, but are used
» Continue reading

Do you Hadoop? How Splunk Can Help

Splunk is providing two applications to integrate Splunk with Hadoop: Splunk Hadoop Connect and the Splunk App for HadoopOps.

These two integrations provide solutions for two major issues of Hadoop. One issue is that developing Hadoop applications is time consuming. As a result, most Hadoop-related projects take a long time to develop, and once developed, still require specialized knowledge to adapt to new requirements. Another issue is that monitoring a Hadoop stack across multiple servers can be extremely complex and time consuming. As a result, critical problems in Hadoop environments will often reoccur and remain unresolved.

Splunk Hadoop Connect, Splunk App for HadoopOps, and Shuttl (archives Splunk files to Hadoop) provide a complete integration to Hadoop.

Splunk Hadoop Connect

Splunk Hadoop …

» Continue reading