Hadoop 2.0 rant
Here we go, time for another rant about Hadoop, this time about Hadoop 2.0. You can read the first rant here.
The rant this time is about Yarn and the way it stores the application logs.
Let’s start with looking at what the main uses of log files are and try to come up with a set of best practices for them. A log file can be used:
- by humans, to troubleshoot application problems
- by humans, to analyze/understand application behavior
- by other applications, to monitor/alert/react to certain application behaviors
- as a method of auditing/recording activity within an application (e.g. user action auditing)
From these use cases a few high level best practices fall out almost immediately, the log …
Hunk: Raw data to analytics in < 60 minutes
Update: now with UI setup instructions
Summary of what we’ll do
1. Set up the environment
2. Configure Hunk
3. Analyze some data
So let’s get started ..
Minutes 0 – 20: Set up the environment
In order to get up an running with Hunk you’ll need the following software packages available/installed in the server running Hunk:
1. Hunk bits – download Hunk and you can play with it free for 60 days
2. JAVA – at least version 1.6 (or whatever is …
Hunk: Splunk Analytics for Hadoop Intro – Part 2
Now that you know the basic technology behind Hunk, lets take a look at some of the features of Hunk and how they unlock the value of the data resting in Hadoop.
Defining the problem
More and more enterprises these days are storing massive amounts of data in Hadoop, with the goal that someday they will be able to analyze and gain insight from it and ultimately see a positive ROI. Since HDFS is a generic filesystem it can easily store all kinds of data, be it machine data, images, videos, documents etc, if you can put it in a file it can reside in HDFS. However, while storing the data in HDFS is relatively straightforward getting value out of …
Hunk: Splunk Analytics for Hadoop Intro – Part 1
As you might have already seen, we recently announced the beta availability of our latest product, Hunk: Splunk Analytics for Hadoop. In this post I will cover some of the basic technology aspects of this new product and how they enable Hunk to perform analytics on top of raw data residing in Hadoop.
Introduction to Native Indexes
For those of you new to Splunk, please read this section to get a quick understanding of native indexes, as it will help differentiate them from virtual indexes. If you are already a Splunkguru please feel free to skip this section.
Whenever a Splunk indexer ingests raw data from any source (file, script, network, etc.), it performs some processing on that data and …
Hadoop’s rise to fame is based on a fundamental optimization principle in computer science: data locality. Which translated to Hadoop speak would be: Move computation to data, not the other way around
In this post I will rant about one core Hadoop area where this principle is broken (or at least not implemented yet). But, before that I will highlight the submission process of a MapReduce job that processes data residing in HDFS:
On the client: 1. gather all the correct confs, user input etc ... 2. contact NameNode to get a list of files that need to be processed 3. generate a lists of splits that need to run Map tasks on, by: 3.1 for each file returned …
Connecting Splunk and Hadoop
Finally I am getting a some time to write about some cool features of one the projects that I’ve been working on - Splunk Hadoop Connect . This app is our first step in integrating Splunk and Hadoop. In this post I will cover three tips on how this app can help you, all of them are based on the new search command included in the app: hdfs. Before diving into the tips I would encourage that you download, install and configure the app first. I’ve also put together two screencast videos to walk you through the installation process:
Got pony – APAC style!
After the pony-fication of the London office we set up a challenge for Eugenia to make our APAC office “complete”. So, she searched everywhere for the best fitting pony, but …. she had a hard time setting her heart on one … so she got two instead
I am proud to present Butterpac and Butterbar – APAC’s very own ponies !!!!!
Just in case you thought there was some visual trick that duplicated the ponies
Welcoming Butterbar & Butterpac to the family !
Pony riding pony FTW !!!!
Rumor has it these are no simple ponies, if you press the ear (dunno which one) neighs and galloping sounds will be played for your pleasure…
Splunk’s UK office now has it’s very own pony – meet Butternut!
I was visiting our UK office for 2 weeks for partner/support training. This was my first time in London so I found a few things surprising: a) most of the beers served in pubs are flat, wtf? c) the Brits love fried stuff b) the Splunk office was a bit low energy, something was missing. However, the later all change when one day on our way to lunch Jaleh, one of our coworkers, noticed a pony on display – everyone was super excited and we just had to get it !!! For some reason, HR had to approve this first
This is what happened the following day
Cannot search based on an extracted field
UPDATE: in 4.3 and after search time fields extracted from indexed fields work without any further configuration
In the past couple of days I had to help people from support and professional services troubleshoot the exact same problem twice, so chances it might be useful for you too
I have setup a regex based field extraction, let’s say the field name is MyField. When I run a search, say “sourcetype=MyEvents” I see that the field is extracted correctly. However, when I run a search based on a value of MyField, say “sourcetype=MyEvents MyField=ValidValue” nothing gets returned. WTF?
For the impatient, here’s how to solve this.
$SPLUNK_HOME/etc/system/local/fields.conf [MyField] INDEXED_VALUE = false
In order to understand …
Storing encrypted credentials
Splunk 4.2 was released today and your new resolution:
Build the greatest Splunk app that gathers data from all different source, some that are public and others that require credentials, index them in Splunk and then do some cool things with it.
This blog post will only be concerned with one small, but important aspect of your great app: how to securely store user credentials yet be able to safely access them in clear text when needed. I will split up the post into four sections: get credentials from the user, access them from your script, where are the credentials stored and security implications.
Get and securely store user credentials
The best time to get user credentials for you app …