Caching Hadoop Data with Splunk and Hunk
Although Hadoop is good at processing a large amount of data, it is not the fastest platform. Below are a list of options that Splunk and Hunk can offer to speed up the retrieval of results and lower the processing overhead of Hadoop.
Each option has its own advantages:
1) Hunk Report Acceleration
This option caches the results in HDFS and keeps it fresh and current. By default, Hunk will check for new Hadoop data every 10 minutes.
2) Hunk Scheduled Searches
This option caches the results on the Hunk node and is available on the Search head for double the frequency of the schedule. For example, if you schedule the search to run every 4 hours, the results …
Hunk Setup using Hortonworks Hadoop Sandbox
Hortonworks Sandbox is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop examples. Recently Hortonworks and Splunk released a tutorial and video to install and connect Hunk with the Hortonworks Hadoop Sandbox version 1.3
This blog summarizes the configurations used as part of the Hunk setup.
Configurations for Hadoop Provider:
|Hadoop Version||Hadoop version 1.x, (MR1)|
|Splunk search recordreader||com.splunk.mr.input.SimpleCSVRecordReader, com.splunk.mr.input.ValueAvroRecordReader|
Configurations for Hadoop Virtual Indexes:
|Path to data in HDFS||/user/hue/raanan/…|
For more Hunk details and examples go to the blog:
Splunk Hadoop Connect 1.1 – Opening the door to MapR; now available on all Hadoop distributions
I am happy to announce that Splunk Hadoop Connect 1.1 is now available. This version of Hadoop Connect rounds out Splunk’s integration with the Hadoop distributions by becoming certified on MapR. Cloudera, Hortonworks, and Apache Hadoop distributions also have the ability to benefit from the power of Splunk.
Splunk Hadoop Connect provides bi-directional integration to easily and reliably move data between Splunk and Hadoop. It provides Hadoop users the ability to gain real-time analysis, visualization and role based access control for a stream of machine-generated data. It delivers three core capacities: Export data from Splunk to Hadoop, Explore Hadoop directories and Import data from Hadoop to Splunk.
The most significant new feature added to version 1.1 is the …
Hadoop and Splunk Use cases
Customer Examples – Using both Splunk and Hadoop
The Splunk and Hadoop communities can benefit from each other’s strengths. Below are several examples of customers that use both environments.
|1 – Splunk then Hadoop||Splunk collects, visualizes, and analyzes the data and passes it to Hadoop for ETL and other batch processing|
|2 – Hadoop then Splunk||Hadoop Collects the Data, and passes the results to Splunk for Visualization|
|3 – Data flows in both directions||Splunk and Hadoop collect different artifacts and share the data that Hadoop needs for ETL or batch analytics and Splunk needs for real-time analysis and visualization|
|4 – Side-by-Side||Both Splunk and Hadoop are used by the organization, but are used|
Do you Hadoop? How Splunk Can Help
Splunk is providing two applications to integrate Splunk with Hadoop: Splunk Hadoop Connect and the Splunk App for HadoopOps.
These two integrations provide solutions for two major issues of Hadoop. One issue is that developing Hadoop applications is time consuming. As a result, most Hadoop-related projects take a long time to develop, and once developed, still require specialized knowledge to adapt to new requirements. Another issue is that monitoring a Hadoop stack across multiple servers can be extremely complex and time consuming. As a result, critical problems in Hadoop environments will often reoccur and remain unresolved.
Splunk Hadoop Connect, Splunk App for HadoopOps, and Shuttl (archives Splunk files to Hadoop) provide a complete integration to Hadoop.
Splunk Hadoop Connect
Splunk Hadoop …