Your Splunk Sandbox
When I was an admin, sometimes I wanted to Splunk things, but not in my production environment. Maybe I wanted to add data and define the corresponding sourcetype. Maybe I wanted to mess with some backend conf files. Maybe I wanted to muck around with a new version of a search or dashboard. Whatever the reason, I learned a few approaches that may be obvious for the Splunk Ninjas out there, but not so much for our adorable n00bs. Either way, if you find yourself hesitating to try something Splunky, then this post is for you.
Build a Splunk Sandbox
Ideally, you’re installing Splunk on your local workstation (desktop/laptop), but if your company hasn’t given you access rights to install Splunk, then see if …
Writing Actionable Alerts
Is your Splunk environment spamming you? Do you have so many alerts that you no longer see through the noise? Do you fear that your Splunk is losing its purpose and value because users have no choice but to ignore it?
I’ve been there. I inherited a system like that. And what follows is an evolution of how I matured those alerts from spams to saviors.
Let it be known that Splunk does contain a number of awesome search commands to help with anomaly detection. If you enjoy what you read here, be sure to check them out since they may simplify similar efforts. http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commandsbycategory#Find_anomalies
Stage 1: Messages of Concern
Some of the first alerts created are going to be searches …
How’s my driving?
It was the summer of 2014. I was well into my big data addiction thanks to Splunk. I was looking for a fix anywhere: Splunk my home? Splunk my computer usage? Splunk my health? There were so many data points out there for me to Splunk but none of them would payoff like Splunking my driving…
At the time, my commute was rough. Roads with drastically changing speeds, backups at hills and merges, and ultimately way more stop and go than I could stomach. But how bad was my commute? Was I having as bad an impact on the environment as I feared? Was my fuel efficiency much worse than my quiet cruise-controlled trips between New York and Boston? …
Fixing Scripted Inputs in Tiered Deployments
The Splunk App for Microsoft Exchange has a useful lookup named ad_username. It takes the various forms that you can logon to a domain as (like DOMAIN\user and email@example.com) and normalizes them. Further, it then takes all the user aliases and normalizes them so adrian.hall is the same as ahall and that is the same as adrian. It’s really useful when you are trying to deal with domain accounts from a support functionality – you don’t have to know how they logged in – only what their official username is.
AD_Username is a scripted input written in Python and lives in the bin directory of the application directory. It relies on two files that live in the local directory called …
Search Command> diff
What’s the grooviest Splunk search command goin’ round? It’s diff man, can you dig it?
That’s right, diff. What other command is based on a *nix file comparison utility that’s been around since the early 70’s?
Splunk’s diff operates just like good ol’ diff does on a *nix platform – it compares two inputs and tells you what the differences are, in a very distinct format. But where *nix diff normally compares two files, Splunk’s diff compares the content of two events.
We can use diff to compare one field in an event to that same field in another event, or we can go for broke and have diff compare “_raw” – or the content of the entire event …
SplunkTalk – #69 – The Walking Dead
Ok… we’re officially never again going to say “we’re back”. Except for right now. We’re back. At Splunk’s 2013 User Conference, (a.k.a. “.conf”–get it… dot conf.. our configuration files 😛 ) a number of listeners came up to us and said “Yo… when’s the podcast coming back?!?!?!?” To that we replied, “well, how about now”. So with out further adieu, I, Michael Wilde, your faithful Splunk Ninja would like to introduce an amazing new co-host of SplunkTalk, Hal Rottenberg. (That’s long o in Rottenberg, as in O my gosh he’s great). This episode of SplunkTalk returns with an overview of our favorite features in the newly released “Splunk 6.0”, and a question about a Splunk 6.0 search head …
Clustering Optimizations in Splunk 6
One of the new features we introduced in Splunk 6 is the Simplified Clustering Management. This allows administrator to setup and monitor the health of the cluster through an easy to use, intuitive UI. In addition to the cool new UI, many performance optimizations were added to handle peer failures and recovery from such failures blazingly fast. In this blog post, I’m going to highlight two such performance optimizations.
1. First Searchable Copy Optimization
This optimization is all about making sure that at least one, complete searchable copy exists in the cluster so that business users can continue to use the data while the cluster master is handling peer failures.
Let’s take a look at this with an example. Assume …
Exporting Large Results Sets to CSV
You want to get data out of Splunk. So you do the search you want and create the table you want in the search app. The results are hundreds of thousands of rows, which is good. So you click on the Export button and download the results to CSV. When you open the file, you see 50,000 rows. Is this a common problem? Not really. It’s a large enough result set that most people want to keep it in Splunk for analysis. However, there are times when such a large export is required. You really don’t want to log on to the Splunk server to get it either. So how do you progress?
I recently bumped into this problem myself …
Microsoft Patch Tuesday! Are your servers patched?
Disk Space Estimator for Index Replication
One of the first questions customers ask when they start considering index replication is about storage requirements. Index replication keeps additional copies of data for redundancy purposes, but how would it affect the storage needs and what are the factors to consider in designing scalable storage architecture are the main questions. I’ll cover the important factors in this blog post.
There are two major dimensions to consider. First one is the replication policies and the second one is the data retention period.
Replication Factor (RF) and Searchability Factor (SF) control the replication policies. RF determines the number of raw data files to keep while SF determines the number of time series indexed files. For syslog data, the raw data …