Splunk and AWS sizing revisited

Some time last year, I posted some recommendations for running Splunk on Amazon Web Services (AWS).  While the base recommendations for how to size and architect Splunk have not changed, we do have more clarity into what works best.  Instead of editing that post, I decided that it would be best to review the thought process and give more color to what most people are doing with it.  Before going down the road of sizing on EC2, I highly recommend reviewing our standard documentation.

For general sizing purposes, there are two key factors:

  1. Daily Indexed Volume (how many GB indexed per day?)
  2. Searching and Reporting needs (how many searches or alerts will be

» Continue reading

Restoring an index

In a recent post, I covered some details around a backup strategy.  I left a bit of a teaser at the end, stating I would follow up with a post on index restoration.   Well, here it is…

There are a few scenarios you may encounter when trying to restore or recover an index.  The simplest scenarios, such as moving an index, are covered very well in the moving indexes wiki topic as well as on our answers site.  From a high level, you can move indexes across Splunk installations but must consider the following:

  • The Splunk instance receiving the index has never been configured with an index of the same name –

» Continue reading

Splunk and Chef

For those of you that run Chef in your Splunk environment, or are thinking of doing it, I have some great news. There is now an open source code base on github. Big thanks to Bryan Brandau and Aaron Peterson for working on this! Here is the official tweet and link:

https://twitter.com/#!/agent462/status/154640900566433792

https://github.com/bestbuycom/splunk_cookbook

» Continue reading

Index backup strategy

In this post, I’ll cover one strategy to backup your index.  Before we go any further…

  • Do not do any of this on your production system without testing
  • This applies for version 4.2.x only
  • You should have a very good understanding of Splunk administration, indexes, and buckets (http://docs.splunk.com/Documentation/Splunk/4.2.4/admin/HowSplunkstoresindexes)
  • Read this:  http://docs.splunk.com/Documentation/Splunk/4.2.4/Admin/Backupindexeddata

Let’s assume we have a standalone Splunk deployment that indexes 10 GB/day.  Our goal is to make sure we have a backup on a daily basis, extending all the way back to Splunk’s first received event.   The strategy encompasses a few steps that basically take chunks of the index at set intervals.  We will accept the potential to lose data for the last day, but want…

» Continue reading

Choosing a Forwarder, or not

When deploying Splunk in the wild, there is the task of deciding “to forward, or not to forward”.  This decision comes down to many factors, but the typical response/answer is to use the forwarder.  In this blog, I’ll detail that decision process so you can decide for yourself.

First, let’s quickly explain what a Forwarder does…if you already know, skip to the next paragraph.  Splunk can perform four basic functions:   searching, indexing, forwarding, and acting as a deployment server.   When Splunk is setup to be a forwarder, it reads in the raw data and sends it to a Splunk indexer.  In the latest version of Splunk, we offer an additional software package especially for forwarding (only).  This is…

» Continue reading

How can I get 2 days of in person Splunk training?

Come to .conf 2011!!!  The 2nd annual Splunk user’s conference is upon us in a few weeks.   It is hard to describe how much knowledge is spread throughout the Splunk world in such a short period of time.   Attendees get the latest on Splunk, best practices, solutions insight, product direction, and tons of training content.  Our best engineers are presenting really cool content and this is your chance to interact with them.  While Splunk’s best talent is there, our best customers are also there presenting really cool stuff.  I must say after watching some of the customer presentations last year, I came away with new ideas on how to get more value out of Splunk at other…

» Continue reading

Splunk ate my homework…

Back when I first joined Splunk, I recall our CEO mentioning how Splunk could do everything – including his son’s homework.  Using Splunk to replace Excel for graphing/reporting is a cute trick, but I never thought it might actually be useful for real homework.   Well, fast forward about 3 years and many use-cases later…

Last week, I was brainstorming with a Master’s degree student about how to gather metrics on their group project.  This class was on Advanced Computer Architecture, at Santa Clara University.  After a few minutes of discussion, the student decided that a script to ingest, parse, and output csv data would be the right solution.   From there, they could then plot things in Excel using…

» Continue reading

Splunk and EC2

NOTE:   There is a new and updated post on this topic located here.

Over the past year, Splunk has increased it’s footprint for installations on Amazon EC2.  Along with this, come questions about best practices and recommendations for deploying in a cloud environment.   In this post, I’ll provide some guidance around deploying Splunk on EC2.  It is important to note that the search and indexing load will dictate the hardware requirement.  The following link contains the appropriate guidelines for sizing:   HERE

Let us first review our ‘reference’ server configuration for a deployment that indexes 10-100 GB per day:

  • 8 cores (2 quad core, > 2.5 GHz)
  • 8+ GB RAM
  • RAID 1+0 Disk

» Continue reading

Who is searching in Splunk?

I was recently assisting a customer in an attempt to build a dashboard around usage patterns.   One of the requirements was to detail the metrics around searching.   Specifically, who is running searches and from where?  In the vanilla Splunk Web interface, we package a dashboard that includes a view for unscheduled and scheduled search patterns.   To get to this view, simply navigate to the main Splunk Web interface > Status pull-down > Search activity > Search details.   While this is great, I ran into a problem with my metrics…

Since most users are querying Splunk through the API via a script, these searches appear as unscheduled occurrences.   In this particular environment, these scripted searches occur…

» Continue reading

Staffing for Splunk

How do I staff for Splunk?  A lot of people ask this question and there is not a great deal of information about the topic.  While Splunk can be easy to use and maintain, proper care must be taken to ensure a healthy running Splunk instance.  In this posting, I will detail the most important items with respect to staffing.  First, let’s break this down into 2 topics.

  1. Required skill sets to administer and maintain a Splunk installation
  2. Resource levels required for different installation sizes.

Skill Sets

Let’s start out by considering Splunk specific skills for an administrator.  We recommend that a Splunk administrator starts out by attending our user course and administration course.  These courses will…

» Continue reading