Caching Hadoop Data with Splunk and Hunk
Although Hadoop is good at processing a large amount of data, it is not the fastest platform. Below are a list of options that Splunk and Hunk can offer to speed up the retrieval of results and lower the processing overhead of Hadoop.
Each option has its own advantages:
1) Hunk Report Acceleration
This option caches the results in HDFS and keeps it fresh and current. By default, Hunk will check for new Hadoop data every 10 minutes.
2) Hunk Scheduled Searches
This option caches the results on the Hunk node and is available on the Search head for double the frequency of the schedule. For example, if you schedule the search to run every 4 hours, the results …
Christmas 2020. Will big data and IOT change things for Father Christmas? Part II
In part 1 we discussed how Father Christmas is planning to use sensor data for the Internet of Toys.
In part 2 we’re going to discuss how he is going to use very large data sets to build out his Christmas 2020 technology strategy.
Big Data & Analytics
There’s a lot of information that goes into making Christmas a success. This data includes:
- Social media sentiment about good or naughty children
- Christmas present lists from children (both digitized scanned letters and increasingly electronic present lists)
- Data from toys and manufacturing equipment to spot patterns in quality control
- 500 years of Christmas Eve delivery data to help optimize sleigh route planning
- Reindeer biometric information to ensure optimum
From big data to a 360 degree customer view with Hunk and Hortonworks
You can’t really escape the fact that we’re in the age of the customer. From CRM to the “long tail” to multi-channel to social media brand sentiment to Net Promoter Scores – it is all about customer experience. Big Data has an important part to play – no great revelation there but how do you actually do it? There are an awful lot of questions that come up when it comes to Big Data and customer view;
What should my architecture be? How do I put together the right data strategy for the short and long term? How do I get the value from the data? How do I build customer analytics on top of my data? How do I …
Get Value Out of Your Data in Hadoop, Starting Today
For years we’ve been working with thousands of companies using Splunk for big data solutions that range from security to business analytics and everything in between. The best part is our customers often discover exciting ways to use Splunk and teach us what the product can really do. As you can imagine, all of the customer conversations, product implementations and ROI stories have given Splunk a treasure trove of experience with big data and big data solutions.
So when our customers let us know that getting large amounts of data into Hadoop is straightforward, but getting analytics out is the challenge, we knew there had to be a better way. Customers asked us to make it faster and easier for …
Hunk Setup using Hortonworks Hadoop Sandbox
Hortonworks Sandbox is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop examples. Recently Hortonworks and Splunk released a tutorial and video to install and connect Hunk with the Hortonworks Hadoop Sandbox version 1.3
This blog summarizes the configurations used as part of the Hunk setup.
Configurations for Hadoop Provider:
|Hadoop Version||Hadoop version 1.x, (MR1)|
|Splunk search recordreader||com.splunk.mr.input.SimpleCSVRecordReader, com.splunk.mr.input.ValueAvroRecordReader|
Configurations for Hadoop Virtual Indexes:
|Path to data in HDFS||/user/hue/raanan/…|
For more Hunk details and examples go to the blog:
Big data and financial services – an EMEA perspective
I was lucky enough to attend the first day of the “Big Data in Financial Services” event in London a few days ago. I know some people might not think of that as lucky but I say it on the back of a surprisingly varied agenda, entertaining speakers and a lot of good debate and content on what big data means to FS companies and how they are using it.
The key point that I took away was that right now, FS companies are using big data today to focus on operational issues – risk, efficiency, compliance, security and making better decisions. However, there is a growing trend in FS companies looking at how big data is going …
Further Simplifying Big Data Analytics
In the past we’ve talked about simplifying big data analytics and the 80:20 rule for data analysis. Most organizations spend 80% of analytics efforts running and optimizing the business and 20% on advanced analytics, which includes advanced data mining, algorithm development and advanced predictive modeling.
Hadoop has seen very good adoption for big data analytics, specifically batch analytics for large datasets, and many organizations have initiatives to use it for advanced analytics and optimizing the business. Unfortunately, those organizations are struggling to derive value from their Hadoop implementations. They’re finding that analysis takes too long and requires specialized talent. Another issue is that getting data into Hadoop is difficult, getting meaningful analysis even more challenging.
In the past few months, …
Splunk Hadoop Connect 1.1 – Opening the door to MapR; now available on all Hadoop distributions
I am happy to announce that Splunk Hadoop Connect 1.1 is now available. This version of Hadoop Connect rounds out Splunk’s integration with the Hadoop distributions by becoming certified on MapR. Cloudera, Hortonworks, and Apache Hadoop distributions also have the ability to benefit from the power of Splunk.
Splunk Hadoop Connect provides bi-directional integration to easily and reliably move data between Splunk and Hadoop. It provides Hadoop users the ability to gain real-time analysis, visualization and role based access control for a stream of machine-generated data. It delivers three core capacities: Export data from Splunk to Hadoop, Explore Hadoop directories and Import data from Hadoop to Splunk.
The most significant new feature added to version 1.1 is the …
Shuttl – A New Year a New Release
Data is the life blood of the modern business. Managing the flow of data, however, is as important as the data itself. That is why Shuttl was created. Through Shuttl users can move (nay, shuttl!) buckets of data from Splunk to other systems and back again. This has proved immensely useful as people realize how data can be used and reused to drive business value.
The Elves have been busy at work bringing Shuttl users a bunch of goodies in the form of the new 0.7.2 Release. Christmas came early when the code landed in Master on Github 6 days before Santa’s big night, and now it’s available for download on Splunkbase!
Since Shuttl’s release last year, …
Hadoop and Splunk Use cases
Customer Examples – Using both Splunk and Hadoop
The Splunk and Hadoop communities can benefit from each other’s strengths. Below are several examples of customers that use both environments.
|1 – Splunk then Hadoop||Splunk collects, visualizes, and analyzes the data and passes it to Hadoop for ETL and other batch processing|
|2 – Hadoop then Splunk||Hadoop Collects the Data, and passes the results to Splunk for Visualization|
|3 – Data flows in both directions||Splunk and Hadoop collect different artifacts and share the data that Hadoop needs for ETL or batch analytics and Splunk needs for real-time analysis and visualization|
|4 – Side-by-Side||Both Splunk and Hadoop are used by the organization, but are used|