Managing Index sizes in Splunk

When deploying Splunk, the topic of how to manage index sizes will surface.  The following is a detailed scenario on how you can manage index space in Splunk (Valid for pre 4.2.x lines of Splunk – this is now much easier with 4.2 and higher):

There are a few key factors that influence how much attention you must pay to disk space management.  These factors are:

  • Total Disk Space Available
  • Ratio of Local/Non-Local Disk Space
  • Retention Policy

The first thing you should be aware of, is the minimum free disk space setting (minFreeSpace for diskUsage in server.conf).   This setting tells Splunk to halt indexing when the amount of free disk spec hits this value.   By default, this is set to 2000 (MB).  For enterprise deployments, you may need to move around some data to make space and the 2 GB limit is too small.   Therefore, setting this to 20 GB or more may be ideal.   To set it to 20 GB, create or edit the $SPLUNK_HOME/etc/system/local/server.conf file as follows:

[diskUsage]

minFreeSpace = 20000

The next topic of importance is the amount of local disk space.  If all of your disks are local, then you do not need to be concerned with the following details.  If your Splunk system has a non-local partition that is utilized for long-term storage, then you will need to manage the settings for where Splunk puts older data.  There is a significant amount of information and terminology related to this topic, so we will break things down by using an example scenario.  Let us assume my system is as follows

  • 300 GB of local storage (RAID 10 w/fast disks), mounted as the root partition /
  • 1.0 TB of non-local storage (NFS mounted partition or SAN), mounted to /storage

Splunk strongly recommends that indexing and searching take place on the local disks.  Therefore, we should set Splunk to index data as follows:

  • hot and warm buckets will be stored on local storage
  • cold buckets on the non-local storage

To set this for the main index, you would use the following settings in your indexes.conf file:

[main]

homePath = $SPLUNK_DB/defaultdb/db

coldPath = /storage/defaultdb/colddb

thawedPath = /storage/defaultdb/thaweddb

Now that we have told Splunk where to put the data, we still need to tell it how much space we have available within each location.  To do this, you need to calculate the total need by using the following formula:

LocalDiskSpace = minFreeSpace + (maxHotBuckets * maxDataSize) + (maxWarmDBCount * maxDataSize)

Let’s break down each value in the above equation:

  • LocalDiskSpace = Total Available Space on the local storage partition (300 GB in our case)
  • minFreeSpace = amount of available free disk space until Splunk halts indexing (assume 20 GB as setup earlier)
  • maxHotBuckets = Maximum number of Hot buckets to be spawned.   By default, the main index is set to 10.   All others will use 1 by default.
  • maxDataSize = Bucket Size in MB.  Note that auto=750 MB and auto_high_volume=10 GB.  You can also manually set this by using numeric values in MB.  Since the main index defaults to auto_high_volume, we can assume 10 GB.
  • maxWarmDBCount = total number of warm buckets.  Remember, Indexed data transitions from hot > warm > cold.  By default, this is set to 300 although and this is the value we should be calculating.

Substituting the above values in our fomula gives us the following math:

300 = 20 + (10 * 10) + (300 * 10)

300 = 3120            ?????

Since the local disk space does not match the bucket sizing, we should adjust the number of buckets and/or the size of the buckets.  The best practice is to adjust only the maxWarmDBCount.   This is for two reasons:   it is ideal to have multiple hot buckets for bucket span purposes;  maxDataSize is optimally tuned out of the box;.  If necessary, you could configure between 3-5 maxHotBuckets as that will still allow for a broad range of bucket span.   To revisit our equation algebraically:

maxWarmDBCount = (LocalDiskSpace-minFreeSpace-(maxHotBuckets*maxDataSize))/maxDataSize

maxWarmDBCount = (300-20-100)/10

maxWarmDBCount = 18

We now have our Warm DB count which can be set in the main index stanza.  A more user friendly version for determining sizing:

maxWarmDBCount = LocalDiskSpace/maxDataSize – minFreeSpace/maxDataSize – maxHotBuckets

Now that we have the Local storage sorted out, we must tune the total index size.  Since we have already tuned the hot and warm buckets, Splunk will automatically ‘freeze’ (aka – delete or archive) the oldest cold bucket.  Here is the equation to calculate the maximum index size.

NonLocalDiskSpace = maxTotalDataSizeMB – LocalDiskSpace – minFreeSpace

maxTotalDataSizeMB = NonLocalDiskSpace + LocalDiskSpace – minFreeSpace

maxTotalDataSizeMB = 1000000 + 300000 – 20000

maxTotalDataSizeMB = 1280000

In summary, our main index will be broken down as follows:

  • 100 GB – Hot buckets stored in $SPLUNK_DB/defaultdb/db
  • 180 GB – Warm buckets stored in $SPLUNK_DB/defaultdb/db
  • 1000 GB – Cold buckets stored in /storage/defaultdb

The final stanza in the indexes.conf file would look like this:

[main]

maxWarmDBCount = 18

maxTotalDataSizeMB = 1280

homePath = $SPLUNK_DB/defaultdb/db

coldPath = /storage/defaultdb/colddb

thawedPath = /storage/defaultdb/thaweddb

Thanks Simeon, you’re the best! Just handed this off to the client architect team.

Joshua Rodman
February 16, 2011

Simeon,

Your math is wrong at the end. It should be (assuming still working in MB):

maxTotalDataSizeMB = 1000000 + 300000 – 20000

maxTotalDataSizeMB = 1280000

You’ve just told your Splunk instance your max data size is 1.280GB in your example.

dpaper
April 7, 2011

dpaper: thanks for the correction… all fixed now.

simeon
April 8, 2011

One Trackback

  1. [...] This post was mentioned on Twitter by Splunk. Splunk said: New blog post: Managing Index Sizes in Splunk – http://is.gd/k5XRb [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*