Using Hadoop RecordReaders with Hunk

Hunk is able to process any data format that has a RecordReader a.k.a pre-processor. In previous posts, we showed you how to use pre-processors to search image data with Hunk and how you can write your own RecordReader. In this post, you’ll learn how you can use existing Hadoop RecordReader’s with Hunk, without any modifications!

Hunk’s Hadoop RecordReader requirements

The prerequisites for using Hadoop RecordReader’s with Hunk are:

  • The RecordReader has a constructor that takes no arguments.
  • The RecordReader is using RecordReader.initialize(InputSplit, TaskAttemptContext) for initialization.
  • The .toString() method of the value object that your RecordReader returns from the method .getCurrentValue(), returns a valid data representation which Splunk understands. I.e. you want to have overwritten the java.lang.Object‘s default implementation
» Continue reading

Faster and limitless Hunk archiving to S3 with Hadoop 2.6.0

We’ve learned that Hunk can archive Splunk buckets to HDFS and S3. In this post we’ll see how we can use the new S3 integration introduced in Apache Hadoop 2.6.0, to get better performance and avoid the 5GB file size upload limit.

Edit: While the title states “limitless” the actual limit of a single file object in S3 is currently 5TB.

The new S3 filesystem – S3A

Apache Hadoop 2.6.0 incorporates a new S3 filesystem implementation which has better performance and supports uploads larger than 5GB. The new S3 filesystem is named S3A. It is used with Hadoop by configuring your paths with a s3a prefix like so: s3a://<bucket>/<path>. This should be familiar to you if …

» Continue reading

New in Hunk 6.2.1: Splunk Archiving & Searchable Archives!

  • Archive your existing Splunk indexer’s data with a Hunk 6.2.1
  • Search archived data in place from the Hunk search head
  • Documentation here!

Archive Splunk Data

Hunk 6.2.1 enables you to continuously archive your Splunk data to Hadoop, by pointing a Hunk search head to your Splunk indexers and configuring an new Archive Indexes.

Searching archived data

You can search archived data in place on Hadoop just as easily as you would search any other Splunk index. There’s no need to move data more than once. This works because Hunk already knows how to efficiently search data in Hadoop. We just had to archive the data in a file structure such that Hunk could efficiently prune the data by time.

Here’s …

» Continue reading

Hunk Preprocessors: How to DIY

In the previous blog post on image searching with Splunk, I showed you how you can preprocess data with Hunk to get the ability to Splunk any data. This blog post is all about how to do it yourself.

Code!

Before we start, here are links to the code for the image preprocessor demo:

github.com/splunk/hunk-demo-image-reader
github.com/splunk/hunk-demo-image-viewer

The first link has all the preprocessor code and the second link has the code for making the sweet image UI. You can look at it before, while and/or after reading the rest of the blog post. Enjoy!

Background

A Hunk preprocessor is basically just a Hadoop’s RecordReader<K, V>, where K is irrelevant and V is Text. We provide a base class that …

» Continue reading

Image Search with Splunk and Hunk

One of the sexy new features Hunk brings to the Splunk 6 smorgasbord, is preprocessing data. Since Hunk is built on top of Hadoop’s MapReduce framework, we can utilize it’s preprocessing framework. Basically, now you can take any data, write a piece of code that turns it into text, then search where it is stored!

Update: Code is open sourced here!

I’ve created a demo where you can select colors and get images that match the selection. It looks like this:

Image searching in Splunk? How is this possible? Indexing images?

Indexing images, no. Preprocessing at search time. There are no indexing costs.
I do this by searching a set of images stored on HDFS, my preprocessor extracts the color distribution …

» Continue reading

Splunking for the homeless

Looking for apartments in San Francisco at craigslist is very time consuming. Since Splunk wants me work and not spend all my time browsing craigslist, I decided to create a hack that alerts me whenever there’s an apartment that I’m interested in.
I did this with a bash script and Splunk.

Preparing data for Splunk

When you search for apartments on craigslist, you get a nice url which contains all your apartment constraints. We want to know when there’s a new apartment ad, so we’ll want to get the most recent apartment ad, and we’ll let Splunk figure out if it’s a new apartment.
The script for doing this:
#!/bin/bash

craigslist_search="http://sfbay.craigslist.org/search/apa/sfc?zoomToPosting=&query=&srchType=A&minAsk=2000&maxAsk=3500&bedrooms=2&nh=4&nh=11&nh=10&nh=18&nh=29"

curl -s $craigslist_search | \
  sgrep -o "%r\n" -i …
» Continue reading

Happy Halloween greetings from Splunk!

» Continue reading

Splunk @ jQuery developer summit 2012

Background

jQuery developer summit is an anual opportunity to meet the entire jQuery team and learn how you can get involved in the jQuery community. They tell you how each jQuery project works, what’s being done and how you can contribute to the projects.
Splunk sponsored the event and has given the jQuery foundation a non-profit Splunk license.
I went to this event together with four awesome Splunkers from the San Francisco office all the way to AOL in Dulles, Virginia.

Sunday evening

jQuery treated us and all the other attendees with food, beer and a chair, from sunday to tuesday. It was a very good time and we met a lot of very nice people.
Before the event …

» Continue reading

Splunkgit – Github just got Splunked! (Part 4/4)

This is the fourth and last part in a four part series where Petter and Emre covers their Splunk app, Splunkgit. The Splunk app is available for download on splunkbase here, and it is also on github here.

In the first and second part we explained the technical details of how to fetch the data. In this post and the previous one, we show you the actual data and how we choose to visualize it.

Usages of the git tab


Let’s check out what our Splunkgit app can do with out git repository data.
We’ve made four dashboards to show off some examples of how you can visualize this data. The ones we’ve made are:

  • Files
  • Authors
  • File
» Continue reading

Splunkgit – Github just got Splunked! (Part 2/4)

This is the second part in a four part series where Petter and Emre covers their Splunk app, Splunkgit. The Splunk app is available for download on splunkbase here, and it is also on github here. You can find part 1 here if you missed it.

Who am I?

Hello there blog reader! As this is my first post, I will do what I am told and introduce myself. My name is Petter Eriksson and I study computer science at Royal Institute of Technology, Stockholm, Sweden. I am an intern here at Splunk for 6 months, where I will be doing software development and also my master thesis. Being an intern here at Splunk has been great so …

» Continue reading