Caching Hadoop Data with Splunk and Hunk
Although Hadoop is good at processing a large amount of data, it is not the fastest platform. Below are a list of options that Splunk and Hunk can offer to speed up the retrieval of results and lower the processing overhead of Hadoop.
Each option has its own advantages:
1) Hunk Report Acceleration
This option caches the results in HDFS and keeps it fresh and current. By default, Hunk will check for new Hadoop data every 10 minutes.
2) Hunk Scheduled Searches
This option caches the results on the Hunk node and is available on the Search head for double the frequency of the schedule. For example, if you schedule the search to run every 4 hours, the results will be kept in cache for 8 hours.
3) Hunk Summary Indexing
This option allows you to create a small summary index on the Hunk node. You can then run searches and reports on this summary index.
4) Static Reports
This option allows you to generate a static report and lets you view it without any overhead on Splunk or Hunk.
5) Hadoop Connect Import (part of the Hadoop Connect App)
This option allows you to take data from HDFS and import it to a Splunk Indexer, and every time new data arrives in HDFS it will automatically be copied to Splunk.