Capturing Omniture (or Google Analytics, or Webtrends) Data into Splunk

I’ve spoken to many customers who love their client-side tracking tools (Omniture, Google Analytics, Webtrends, etc.) but also want to get that data into Splunk so that they can correlate web traffic data with other things and really see “the big picture”.  But how?  What are the options?  Basically there are four ways to go:

Option #1: CSV Export

Create a report in your client-side tracking tool of choice and export the data.  In Splunk, upload the data (“Manager > Add Data > From files and directories”) and voila, you may now visualize and correlate to your heart’s content.
Pros: Easy and fast access to Splunk’s correlation, visualization, and analysis features.
Cons: Not automated, not real-time, and limited access to the data.

Option #2: Automatic CSV Export

Some client-side tracking tools allow for automated generation of CSV reports and sent to a folder via FTP.  In those cases, you just point Splunk to the folder in question (“Manager > Add Data > From files and directories”).
Pros: Automated access to Splunk’s correlation, visualization, and analysis features.
Cons: Not real-time, and limited access to the data.

Option #3: API

Virtually all of the popular client-side tracking tools provide API access to their data.  If you’re handy with Python, Java, JavaScript, PHP, Ruby, or C#, you can leverage Splunk’s wide array of SDKS to pull data from the client-side tracking tool and index it directly into Splunk.  If you need to use a language not supported by a Splunk SDK, first of all let us know, and then write the script in the language you need, and have Splunk trigger the script at whatever frequency you’d like (“Manager > Data inputs > Script > Add new”).  You will need to wrap the script in a shell script first.
Pros: Automated access to Splunk’s Splunk’s correlation, visualization, and analysis features and (depending on the client-side tool’s API) possibly less limited access to the source data.
Cons: Not real-time.

Option #4: Capture at the Source

All client-side tracking tools function basically the same way: a user does something (load a page, click on something, mouse over something, etc.), javascript is triggered, and then a call to a tiny, fake image is made to the tool’s servers.  The tiny, fake image call appends some key/value pairs which are captured and crunched by the analytics tool.  So, for example:
http://www.trackingtool.com/fake_image.gif?page=index.html&referrer=splunk.com
Would be processed by the tracking tool as a single page view to “index.html”, referred by “splunk.com”.  It’s really no more complicated than that.  So, the very best way to Splunk data from your client side tracking tool is to “capture” that key/value pair string before it’s sent to the tool of choice.  There are myriad ways of doing this: if you are using a tag management system, for example, you typically have the option of logging each call to a text file before it’s sent out.  Then you can have Splunk continuously index that file.  If you aren’t using a tag management system, you can add a few lines of javascript and either use the Splunk JavaScript SDK to inject the event directly into Splunk or, again, write the data to a file for indexing later.  I cover that latter option in a little more detail in a related post: Client Side Splunk!.
Pros: Real time, automated, full access to the data, and of course, Splunk’s correlation, visualization, and analysis features.
Whatever method you use, you’ll find that once the data is in Splunk, you’ll be in a whole different universe of flexibility and insight.  When you have an aha moment, will you share it with us?  Email me at srussell@splunk.com.  I’d love to hear about the creative, innovative ways you’re using the data you already have to better understand your users.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*