Get Value Out of Your Data in Hadoop, Starting Today
For years we’ve been working with thousands of companies using Splunk for big data solutions that range from security to business analytics and everything in between. The best part is our customers often discover exciting ways to use Splunk and teach us what the product can really do. As you can imagine, all of the customer conversations, product implementations and ROI stories have given Splunk a treasure trove of experience with big data and big data solutions.
So when our customers let us know that getting large amounts of data into Hadoop is straightforward, but getting analytics out is the challenge, we knew there had to be a better way. Customers asked us to make it faster and easier for …
Hunk Setup using Hortonworks Hadoop Sandbox
Hortonworks Sandbox is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop examples. Recently Hortonworks and Splunk released a tutorial and video to install and connect Hunk with the Hortonworks Hadoop Sandbox version 1.3
This blog summarizes the configurations used as part of the Hunk setup.
Configurations for Hadoop Provider:
|Hadoop Version||Hadoop version 1.x, (MR1)|
|Splunk search recordreader||com.splunk.mr.input.SimpleCSVRecordReader, com.splunk.mr.input.ValueAvroRecordReader|
Configurations for Hadoop Virtual Indexes:
|Path to data in HDFS||/user/hue/raanan/…|
For more Hunk details and examples go to the blog:
Big data and financial services – an EMEA perspective
I was lucky enough to attend the first day of the “Big Data in Financial Services” event in London a few days ago. I know some people might not think of that as lucky but I say it on the back of a surprisingly varied agenda, entertaining speakers and a lot of good debate and content on what big data means to FS companies and how they are using it.
The key point that I took away was that right now, FS companies are using big data today to focus on operational issues – risk, efficiency, compliance, security and making better decisions. However, there is a growing trend in FS companies looking at how big data is going …
Further Simplifying Big Data Analytics
In the past we’ve talked about simplifying big data analytics and the 80:20 rule for data analysis. Most organizations spend 80% of analytics efforts running and optimizing the business and 20% on advanced analytics, which includes advanced data mining, algorithm development and advanced predictive modeling.
Hadoop has seen very good adoption for big data analytics, specifically batch analytics for large datasets, and many organizations have initiatives to use it for advanced analytics and optimizing the business. Unfortunately, those organizations are struggling to derive value from their Hadoop implementations. They’re finding that analysis takes too long and requires specialized talent. Another issue is that getting data into Hadoop is difficult, getting meaningful analysis even more challenging.
In the past few months, …
Splunk Hadoop Connect 1.1 – Opening the door to MapR; now available on all Hadoop distributions
I am happy to announce that Splunk Hadoop Connect 1.1 is now available. This version of Hadoop Connect rounds out Splunk’s integration with the Hadoop distributions by becoming certified on MapR. Cloudera, Hortonworks, and Apache Hadoop distributions also have the ability to benefit from the power of Splunk.
Splunk Hadoop Connect provides bi-directional integration to easily and reliably move data between Splunk and Hadoop. It provides Hadoop users the ability to gain real-time analysis, visualization and role based access control for a stream of machine-generated data. It delivers three core capacities: Export data from Splunk to Hadoop, Explore Hadoop directories and Import data from Hadoop to Splunk.
The most significant new feature added to version 1.1 is the …
Shuttl – A New Year a New Release
Data is the life blood of the modern business. Managing the flow of data, however, is as important as the data itself. That is why Shuttl was created. Through Shuttl users can move (nay, shuttl!) buckets of data from Splunk to other systems and back again. This has proved immensely useful as people realize how data can be used and reused to drive business value.
The Elves have been busy at work bringing Shuttl users a bunch of goodies in the form of the new 0.7.2 Release. Christmas came early when the code landed in Master on Github 6 days before Santa’s big night, and now it’s available for download on Splunkbase!
Since Shuttl’s release last year, …
Hadoop and Splunk Use cases
Customer Examples – Using both Splunk and Hadoop
The Splunk and Hadoop communities can benefit from each other’s strengths. Below are several examples of customers that use both environments.
|1 – Splunk then Hadoop||Splunk collects, visualizes, and analyzes the data and passes it to Hadoop for ETL and other batch processing|
|2 – Hadoop then Splunk||Hadoop Collects the Data, and passes the results to Splunk for Visualization|
|3 – Data flows in both directions||Splunk and Hadoop collect different artifacts and share the data that Hadoop needs for ETL or batch analytics and Splunk needs for real-time analysis and visualization|
|4 – Side-by-Side||Both Splunk and Hadoop are used by the organization, but are used|
Simplifying Big Data Analytics
Most analytics and data projects have started thinking of investing in big data initiatives. With so much buzz about big data, organizations have started investing or are thinking of investing in Hadoop While it is great to stay on top of trends, it often ends up being another investment where the full benefit and potential is simply not realized. The learning curve is too steep and the time to implement too high. Current analytics resources lack the strong programming skills required to conduct even simple analysis tasks and activities using Hadoop. In this post, I would like to focus on providing a better understanding of what types of analysis are better suited for Hadoop vs. non-Hadoop technologies in order to simplify …
Building your big data reference architecture
With all of the value now being placed on data and the ability to use that data to improve customer experience, optimize revenue and enable growth in business the ability to find a way to ingest and save the data is critical. While there is a lot of advertising and press about many solutions ability to address any needs of the enterprise where does a CXO turn to figure it all out? In the past 2+ years I have evaluated solutions in the “big data” space to address all of the problems the IT and business users threw at me. In all of the evaluation, testing and validation of products I found that there is no single solution now or …
Unlocking Splunk Data with Shuttl
Shuttl is being featured at Splunk’s Worldwide Users’ Conference 2012. I’ve talked about the benefits of Shuttl for efficiently and scalably bulk-moving Splunk data to HDFS for Archiving in a past blog announcing its availability, and here I’ll expand on how it enables the emerging theme of Big Data Integration.
Big Data Integration
In the big data space, the diversity of technologies is not only huge, but fast changing. Every time I hear about a new technology, the first thing I think of is, “How will it integrate with other data technologies?”
Despite much of the discussion about big data having to do with volume, latency, scalability, availability, consistency, flexibility, etc. it seems only when real projects are …