Parallel Data Transfer in Shuttl 0.8.0
In the world of Big Data, parallelism and distribution are key aspects of any successful technology. Shuttl was designed from the ground-up with those things in mind by piggy-backing on Splunk’s architecture. Up until now, moving data from Splunk to another data store was done in parallel, however, many administrative and restoration actions were more cumbersome without one key aspect: orchestration.
Shuttl 0.8.0 was released just two months after the prior 0.7.x release, and includes such orchestration, resulting in three significant main features:
- Parallel List
- Parallel Data Pull
- Parallel Data Flush
Before we move on to the new features, if you aren’t familiar with Shuttl, I’d recommend reading the prior blogs on the topic. (http://blogs.splunk.com/?s=shuttl)
Shuttl – A New Year a New Release
Data is the life blood of the modern business. Managing the flow of data, however, is as important as the data itself. That is why Shuttl was created. Through Shuttl users can move (nay, shuttl!) buckets of data from Splunk to other systems and back again. This has proved immensely useful as people realize how data can be used and reused to drive business value.
The Elves have been busy at work bringing Shuttl users a bunch of goodies in the form of the new 0.7.2 Release. Christmas came early when the code landed in Master on Github 6 days before Santa’s big night, and now it’s available for download on Splunkbase!
Since Shuttl’s release last year, …
Unlocking Splunk Data with Shuttl
Shuttl is being featured at Splunk’s Worldwide Users’ Conference 2012. I’ve talked about the benefits of Shuttl for efficiently and scalably bulk-moving Splunk data to HDFS for Archiving in a past blog announcing its availability, and here I’ll expand on how it enables the emerging theme of Big Data Integration.
Big Data Integration
In the big data space, the diversity of technologies is not only huge, but fast changing. Every time I hear about a new technology, the first thing I think of is, “How will it integrate with other data technologies?”
Despite much of the discussion about big data having to do with volume, latency, scalability, availability, consistency, flexibility, etc. it seems only when real projects are …
The Stockholm Technology Forum Unconference
On July 19th, Bontouch and Splunk sponsored the first Stockholm Technology Forum Unconference. The purpose was to bring local software professionals together to network, share, and discuss. The theme was Big Data (what else?), and it was held at Bontouch’s offiices in Kungsholmen (one of the several islands making up Stockholm city).
If you are unfamiliar with what an unconference is, you can read the Wikipedia article about it. Basically, it’s attendee led. As an organizer, it can be disconcerting and worrisome.
Will people attend? Stockholm in the summer is known for sunny long days set among the water. The city is relatively deserted of locals, who are out vacationing.
Will we run out of things to talk …
Splunk Shep Update
Late last year we announced a planned integration layer between Splunk and Hadoop. We called it the Shep project. We saw a tremendous response, signifying the pent up interest in a Splunk-Hadoop integration. To me, it indicated that people just “got it.” What do I mean by that? They got that bringing Splunk technology to the Hadoop ecosystem meant a leap forward in making the promise of Big Data a reality to a huge segment of the industry.
As an example, areas where people saw immediate value were:
- Opening up Splunk-ingested data to a variety of groups building analytics on Hadoop in the Enterprise
- Using Splunk as a way to search and visualize data contained in Hadoop
- Using Splunk to
Shuttl for Big Data Archiving
As I mentioned in my last blog, archiving for big data is important. If you haven’t already, please read it before going on. If you have already read it, read it again. It’s important.
Are you back? OK.
- Data loss
- Organizing data
- Pluggable backend support
- Search for what’s been archived
- Selective “thaw” of frozen buckets
- Flushing of thawed buckets
So, how do you meet these challenges? I’m so glad you asked.
Shuttl is an open source product that works with Splunk to reliably archive your Splunk data …
4 Reasons You Need Big Data Archiving for Splunk
Calling all data hoarders: Splunk is collecting lots of data for you–what are you doing with it all? Some of you are letting it “roll” to oblivion: “Who needs it? it’s just taking up space!” Some are keeping it all in the live system: “I want all my data, all the time!” Many are rolling it out to another backend system for storage: “I might need this later!“
If the last one is your response, then congratulations, you chose wisely. Here are 4 reasons you really want to keep that data somewhere–but not in your live system:
- Data value is perishable, keeping unnecessary data on the live system will slow you down.
- Deleting data
Simple Splunking of HDFS Files
There’s something to be said about the power of command line interfaces. For simple things, they are simple. For complex things–well, maybe not so simple. Fortunately, I have a simple problem: I want to index a single file from a Hadoop Distributed File Sytem, HDFS. To do this, I’ll use the CLI for both Splunk and Hadoop.
There are a few things we want to take into account when we index a file. Normally, indexing a log file in Splunk means creating an input to “monitor” that file. This enables you to not only index the file’s current contents, but also index subsequent appends. However, the contents of an HDFS are typically historical files, so in this case, I don’t …
Swimming with Dolphins
Last week, Splunk posted its MySQL Connector on Splunkbase. The release may have escaped your notice, so I’ll describe what we’ve shipped, and why this is so significant. In fact, by the end of this blog, you’ll realize that to call it a “connector” is a modest understatement of what it actually is.
To start, there’s been a natural tension between the SQL and NoSQL worlds. Some have gone so far as to proclaim the end of SQL, which is a bit premature. SQL is still a key data technology, and will continue to be one. However, the world of data technologies is expanding. Rather than replacing existing incumbents, technologies are more and more diversifying, opening up possibilities that …
Data, Best Used By…
To state the obvious, “Big Data” is big. The deluge of data, has people talking about volume of data, which is understandable, but not as much attention has been paid to how the value of data can age. Instead, value is often actually not just about volume. It can also be thought of as perishable.
When we think about the perishability of data, we find all sorts of every-day examples around us. When we pick up daily newspaper, the headlines catch our attention. The value is in the recentness of the data. Why we call it “news.” A year old newspaper in comparison is usually useful for starting a fire, or lining a bird cage. The …