Splunk Hadoop Connect 1.1 – Opening the door to MapR; now available on all Hadoop distributions

I am happy to announce that Splunk Hadoop Connect 1.1 is now available. This version of Hadoop Connect rounds out Splunk’s integration with the Hadoop distributions by becoming certified on MapR. Cloudera, Hortonworks, and Apache Hadoop distributions also have the ability to benefit from the power of Splunk.

Splunk Hadoop Connect provides bi-directional integration to easily and reliably move data between Splunk and Hadoop. It provides Hadoop users the ability to gain real-time analysis, visualization and role based access control for a stream of machine-generated data. It delivers three core capacities: Export data from Splunk to Hadoop, Explore Hadoop directories and Import data from Hadoop to Splunk.

The most significant new feature added…

» Continue reading

Shuttl – A New Year a New Release

Data is the life blood of the modern business. Managing the flow of data, however, is as important as the data itself. That is why Shuttl was created. Through Shuttl users can move (nay, shuttl!) buckets of data from Splunk to other systems and back again. This has proved immensely useful as people realize how data can be used and reused to drive business value.Happy New Year 2013!

The Elves have been busy at work bringing Shuttl users a bunch of goodies in the form of the new 0.7.2 Release. Christmas came early when the code landed in Master on Github 6 days before Santa’s big night, and now it’s available for download on Splunkbase!

Since Shuttl’s release last year,…

» Continue reading

Hadoop and Splunk Use cases

Customer Examples – Using both Splunk and Hadoop

The Splunk and Hadoop communities can benefit from each other’s strengths. Below are several examples of customers that use both environments.

Use Case Description
1 – Splunk then Hadoop Splunk collects, visualizes, and analyzes the data and passes it to Hadoop for ETL and other batch processing
2 – Hadoop then Splunk Hadoop Collects the Data, and passes the results to Splunk for Visualization
3 – Data flows in both directions Splunk and Hadoop collect different artifacts and share the data that Hadoop needs

» Continue reading

Simplifying Big Data Analytics

Most analytics and data projects have started thinking of investing in big data initiatives.  With so much buzz about big data, organizations have started investing or are thinking of investing in Hadoop While it is great to stay on top of trends, it often ends up being another investment where the full benefit and potential is simply not realized. The learning curve is too steep and the time to implement too high. Current analytics resources lack the strong programming skills required to conduct even simple analysis tasks and activities using Hadoop. In this post, I would like to focus on providing a better understanding of what types of analysis are better suited for Hadoop vs. non-Hadoop technologies in order to simplify…

» Continue reading

Building your big data reference architecture

With all of the value now being placed on data and the ability to use that data to improve customer experience, optimize revenue and enable growth in business the ability to find a way to ingest and save the data is critical. While there is a lot of advertising and press about many solutions ability to address any needs of the enterprise where does a CXO turn to figure it all out? In the past 2+ years I have evaluated solutions in the “big data” space to address all of the problems the IT and business users threw at me. In all of the evaluation, testing and validation of products I found that there is no single solution now or…

» Continue reading

Unlocking Splunk Data with Shuttl

Shuttl is being featured at Splunk’s Worldwide Users’ Conference 2012. I’ve talked about the benefits of Shuttl for efficiently and scalably bulk-moving Splunk data to HDFS for Archiving in a past blog announcing its availability, and here I’ll expand on how it enables the emerging theme of Big Data Integration.

Big Data Integration

In the big data space, the diversity of technologies is not only huge, but fast changing. Every time I hear about a new technology, the first thing I think of is, “How will it integrate with other data technologies?”

Despite much of the discussion about big data having to do with volume, latency, scalability, availability, consistency, flexibility, etc. it seems only when real…

» Continue reading

Do you Hadoop? How Splunk Can Help

Splunk is providing two applications to integrate Splunk with Hadoop: Splunk Hadoop Connect and the Splunk App for HadoopOps.

These two integrations provide solutions for two major issues of Hadoop. One issue is that developing Hadoop applications is time consuming. As a result, most Hadoop-related projects take a long time to develop, and once developed, still require specialized knowledge to adapt to new requirements. Another issue is that monitoring a Hadoop stack across multiple servers can be extremely complex and time consuming. As a result, critical problems in Hadoop environments will often reoccur and remain unresolved.

Splunk Hadoop Connect, Splunk App for HadoopOps, and Shuttl (archives Splunk files to Hadoop) provide a complete integration to Hadoop.

Splunk Hadoop Connect

Splunk Hadoop…

» Continue reading

Shuttl for Big Data Archiving

As I mentioned in my last blog, archiving for big data is important. If you haven’t already, please read it before going on. If you have already read it, read it again. It’s important.

Are you back? OK.

Now, as I mentioned, archiving has pitfalls and challenges, and people typically custom script solutions for it. Here’s a recap of some of the challenges of a good archiving solution:

  1. Data loss
  2. Organizing data
  3. Pluggable backend support
  4. Search for what’s been archived
  5. Selective “thaw” of frozen buckets
  6. Flushing of thawed buckets

So, how do you meet these challenges? I’m so glad you asked.

Announcing Shuttl!

Shuttl is an open source…

» Continue reading

Analytics Staffing for Big Data: A Perspective

Couple of weeks ago, we talked about the need to appropriately invest in people, when you invest in technology.  I wanted to continue the discussion and focus on the new area of “Big Data” – more specifically the analyst who works on big data – the “Data Scientist” and the data analyst.

I love the term “data scientist”.  It has finally made the data junkie’s job title more glamorous.  It has given both name and fame to the role.  Well everyone is talking about “big data”.  Many organizations think  hiring a data scientist is requirement for solving all “big data” problems and the only analyst required with a big data problem are data scientist. If you have invested in…

» Continue reading

Big Data Thoughts…

It happens to me quite a bit that I hear a song and then it keeps playing in my head.  My 4 year old is notorious for singing the same song over and over and then I find myself humming during my long train ride to work.

Sometimes, it happens at work – you hear a thing and you keep hearing about the same thing in almost every conversation.  I am sure you have had those times too.  A number of you will have had days or weeks when you have had some discussion on “big data”.

For the last three weeks, I have had number of conversations on the topic of big data.  Strata,

» Continue reading