Clustering Optimizations in Splunk 6
One of the new features we introduced in Splunk 6 is the Simplified Clustering Management. This allows administrator to setup and monitor the health of the cluster through an easy to use, intuitive UI. In addition to the cool new UI, many performance optimizations were added to handle peer failures and recovery from such failures blazingly fast. In this blog post, I’m going to highlight two such performance optimizations.
1. First Searchable Copy Optimization
This optimization is all about making sure that at least one, complete searchable copy exists in the cluster so that business users can continue to use the data while the cluster master is handling peer failures.
Let’s take a look at this with an example. Assume …
Exporting Large Results Sets to CSV
You want to get data out of Splunk. So you do the search you want and create the table you want in the search app. The results are hundreds of thousands of rows, which is good. So you click on the Export button and download the results to CSV. When you open the file, you see 50,000 rows. Is this a common problem? Not really. It’s a large enough result set that most people want to keep it in Splunk for analysis. However, there are times when such a large export is required. You really don’t want to log on to the Splunk server to get it either. So how do you progress?
I recently bumped into this problem myself …
Microsoft Patch Tuesday! Are your servers patched?
It’s my most favorite time of the month – Patch Tuesday! Ok, I might be slightly exaggerating there. Let’s face it. It’s a pain in the neck. I have to go around to every server in my development environment and ensure that all the critical patches have been taken care of. Usually, this means a trip to Windows Update, or checking the logs of the Windows Server Update Services (WSUS) server. Today, I woke up and decided Splunk was going to assist with this.
Disk Space Estimator for Index Replication
One of the first questions customers ask when they start considering index replication is about storage requirements. Index replication keeps additional copies of data for redundancy purposes, but how would it affect the storage needs and what are the factors to consider in designing scalable storage architecture are the main questions. I’ll cover the important factors in this blog post.
There are two major dimensions to consider. First one is the replication policies and the second one is the data retention period.
Replication Factor (RF) and Searchability Factor (SF) control the replication policies. RF determines the number of raw data files to keep while SF determines the number of time series indexed files. For syslog data, the raw data …
Replicate your data
Imagine a scenario in which one of your Splunk indexers just abruptly went down due to hardware failures. The data stored in the indexers aren’t available for searching until the indexers are restored. Your business users are unhappy, because they’re unable to act on the very important historical data.
This scenario can be completely avoided, thanks to a new feature in Splunk 5.0 called Index Replication. The index replication allows IT administrators to specify and store redundant copies of the data across a cluster of indexers. When one of the indexers is down, the system automatically detects this failure and redirects the search queries to other available indexers, which has the data. Everything happens so seamlessly that your business users …
The Magic behind Report Acceleration
One of the coolest features we’ve introduced in Splunk 5.0 is Report Acceleration. This speeds up reports by many orders of magnitude, and it is so easy to set up. So, what is the secret behind such a powerful acceleration? I’ll attempt to explain some of the concepts that powers report acceleration in this post.
Before report acceleration, one of the ways for users to speed up reports is through summary indexing. Although very powerful, summary indexing was more suited for Splunk admins rather than for report developers. Summary indexing also didn’t have a way to auto-update its summaries to back-fill data and it stores the summaries on the search heads instead of on the indexers.
Report acceleration is targeted …
You’re happier with fewer friends
Using the new Splunk Sentiment Analysis app I was able to correlate how positive tweets were, depending on how many people follow a twitter account. It’s a slight stretch, but essentially, are you happier with more friends?
index=twitter | sentiment twitter body | chart avg(sentiment) by actor.followersCount
It seems that people with smaller circles of friends are more positive. More friends equals more negativity, up until about 75 friends. Seems like a fairly good life lesson, but take it a grain of salt — spam twitter accounts may skew things.…
Book Excerpt: Finding Specific Transactions
EXCERPT FROM “EXPLORING SPLUNK: SEARCH PROCESSING LANGUAGE (SPL) PRIMER AND COOKBOOK”. Kindle/iPad/PDF available for free, and hardcopy available for purchase at Amazon.
You need to find transactions with specific field values.
A general search for all transactions might look like this:
sourcetype=email_logs | transaction userid
Suppose, however, that we want to identify just those transactions where there is an event that has the field/value pairs to=root and from=msmith. You could use this search:
sourcetype=email_logs | transaction userid | search to=root from=msmith
The problem here is that you are retrieving all events from this sourcetype (potentially billions), building up all the transactions, and then throwing 99% of the data right in to the bit bucket. Not only is it …
Splunk App for Active Directory and the Top 10 Issues
I work a lot with the various people who plan, deploy and support the Splunk App for Active Directory. Some issues come up quite frequently and I thought it would be a good idea to give you a roadmap of things to check as you deploy your environment. I’ll go through the issue and how to check for it so that you can make your roll-out as smooth as possible.