Splunk your Google Analytics
Gain more insight into site performance and user activity by correlating Google Analytics data within Splunk.
A customer of mine recently wanted to understand more about the journey that retail consumers take when they arrive at its website. They recognized that consumers who have previously bought from the site will have more familiarity with the design and layout than those visiting the site for the first time. In addition, consumers who went directly to the site would have a greater brand engagement than those who were referred from an affiliate site.
If only we could implement a method to back up the data that gets submitted to Google Analytics, also sending it back to the local Apache web server logs …
What size should my Splunk license be?
This is a pretty common question in Splunkland. Maybe you’re an admin wondering how much license you’ll need to handle this new data source you have in mind for a great new use case. Or you’re a Splunker trying to answer this question for a customer. Or a partner doing the same. Given how often this comes up, I thought I’d put together an overview of all the ways you can approximate how big a license you need, based on a set of data sources. This post brings together the accumulated wisdom of many of my fellow sales engineers, so before I begin, I’d like to thank them all for the insights and code they shared so willingly. Thank you…
Your Splunk Sandbox
When I was an admin, sometimes I wanted to Splunk things, but not in my production environment. Maybe I wanted to add data and define the corresponding sourcetype. Maybe I wanted to mess with some backend conf files. Maybe I wanted to muck around with a new version of a search or dashboard. Whatever the reason, I learned a few approaches that may be obvious for the Splunk Ninjas out there, but not so much for our adorable n00bs. Either way, if you find yourself hesitating to try something Splunky, then this post is for you.
Build a Splunk Sandbox
Ideally, you’re installing Splunk on your local workstation (desktop/laptop), but if your company hasn’t given you access rights to install Splunk, then see if …
Writing Actionable Alerts
Is your Splunk environment spamming you? Do you have so many alerts that you no longer see through the noise? Do you fear that your Splunk is losing its purpose and value because users have no choice but to ignore it?
I’ve been there. I inherited a system like that. And what follows is an evolution of how I matured those alerts from spams to saviors.
Let it be known that Splunk does contain a number of awesome search commands to help with anomaly detection. If you enjoy what you read here, be sure to check them out since they may simplify similar efforts. http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Commandsbycategory#Find_anomalies
Stage 1: Messages of Concern
Some of the first alerts created are going to be searches …
How’s my driving?
It was the summer of 2014. I was well into my big data addiction thanks to Splunk. I was looking for a fix anywhere: Splunk my home? Splunk my computer usage? Splunk my health? There were so many data points out there for me to Splunk but none of them would payoff like Splunking my driving…
At the time, my commute was rough. Roads with drastically changing speeds, backups at hills and merges, and ultimately way more stop and go than I could stomach. But how bad was my commute? Was I having as bad an impact on the environment as I feared? Was my fuel efficiency much worse than my quiet cruise-controlled trips between New York and Boston? …
Fixing Scripted Inputs in Tiered Deployments
The Splunk App for Microsoft Exchange has a useful lookup named ad_username. It takes the various forms that you can logon to a domain as (like DOMAIN\user and email@example.com) and normalizes them. Further, it then takes all the user aliases and normalizes them so adrian.hall is the same as ahall and that is the same as adrian. It’s really useful when you are trying to deal with domain accounts from a support functionality – you don’t have to know how they logged in – only what their official username is.
AD_Username is a scripted input written in Python and lives in the bin directory of the application directory. It relies on two files that live in the local directory called …
Search Command> diff
What’s the grooviest Splunk search command goin’ round? It’s diff man, can you dig it?
That’s right, diff. What other command is based on a *nix file comparison utility that’s been around since the early 70’s?
Splunk’s diff operates just like good ol’ diff does on a *nix platform – it compares two inputs and tells you what the differences are, in a very distinct format. But where *nix diff normally compares two files, Splunk’s diff compares the content of two events.
We can use diff to compare one field in an event to that same field in another event, or we can go for broke and have diff compare “_raw” – or the content of the entire event …
SplunkTalk – #69 – The Walking Dead
Ok… we’re officially never again going to say “we’re back”. Except for right now. We’re back. At Splunk’s 2013 User Conference, (a.k.a. “.conf”–get it… dot conf.. our configuration files 😛 ) a number of listeners came up to us and said “Yo… when’s the podcast coming back?!?!?!?” To that we replied, “well, how about now”. So with out further adieu, I, Michael Wilde, your faithful Splunk Ninja would like to introduce an amazing new co-host of SplunkTalk, Hal Rottenberg. (That’s long o in Rottenberg, as in O my gosh he’s great). This episode of SplunkTalk returns with an overview of our favorite features in the newly released “Splunk 6.0”, and a question about a Splunk 6.0 search head …
Clustering Optimizations in Splunk 6
One of the new features we introduced in Splunk 6 is the Simplified Clustering Management. This allows administrator to setup and monitor the health of the cluster through an easy to use, intuitive UI. In addition to the cool new UI, many performance optimizations were added to handle peer failures and recovery from such failures blazingly fast. In this blog post, I’m going to highlight two such performance optimizations.
1. First Searchable Copy Optimization
This optimization is all about making sure that at least one, complete searchable copy exists in the cluster so that business users can continue to use the data while the cluster master is handling peer failures.
Let’s take a look at this with an example. Assume …
Exporting Large Results Sets to CSV
You want to get data out of Splunk. So you do the search you want and create the table you want in the search app. The results are hundreds of thousands of rows, which is good. So you click on the Export button and download the results to CSV. When you open the file, you see 50,000 rows. Is this a common problem? Not really. It’s a large enough result set that most people want to keep it in Splunk for analysis. However, there are times when such a large export is required. You really don’t want to log on to the Splunk server to get it either. So how do you progress?
I recently bumped into this problem myself …