Splunk Ninja - Fields of Dreams

I spend a great deal of time using, learning and demonstrating Splunk, and recently I had some questions from users on “what can I do with fields?”, “how do i make them?”, “how do I tweak them?”. That inspired me to publish a new Splunk Ninja episode known as “Fields of Dreams”.

In this episode, Splunk Ninja gives an all out tour of “fields” in Splunk 4.0, how they work, how to use them, some tips and tricks as well.

The ability for Splunk to handle multiple data formats all in a single search index and do “search time field extraction” is unique to the marketplace.

Additionally, you’ll see me take fields and use them to assemble a transaction with Cisco PIX firewall logs. I use the “| transaction” search command to link and calculate the duration of outbound TCP connections.

Comments, suggestions, or new Splunk Ninja video ideas welcome!

Note: Often in blogs, this one, and on my site http://splunkninja.com the “fullscreen toggle” buttons don’t work properly on videos that are embedded.  I shoot all of mine in 1280×720 (720p) resolution. If you would like to go directly to the episode so you can watch it in fullscreen or even download it, go here:

Splunk Ninja - Fields of Dreams

Blogged with the Flock Browser

Tags: , , , , ,

My head is in the clouds? Help me RightScale

Update: If you’re interested in checking out the Recorded Webinar as a result of the news below, it is located here:

RightScale / Splunk Webinar

If you’re new to the cloud, and new to Splunk–or neither–spare an hour tomorrow, February 10th at 11am PST.  Splunk and RightScale will be putting on a pretty cool webinar about IT search in the cloud.  Infrastructure-as-a-service is becoming more popular as a solution to many challenges IT faces in the coming years.  Our friends over at RightScale have quite an amazing platform for managing cloud infrastructure.

RightScale makes it dead simple to get infrastructure deployed in the cloud, but once you’re up and running, what about your IT data–logs, configurations, messages, etc?  Thats where our partnership comes in.  Rightscale is a Splunk Powered Associate–that means, if you want your Splunk in the cloud, check them out.



Specifically, any user of RightScale, can easily install the Free License of Splunk (limited to 500MB 
of indexed data per day)–without downloading.  After a few clicks, your Splunk server will be installed and ready to receive data.  A few more clicks and you can simply install a Splunk forwarder on every server in your deployment Rightscale’s configuration system makes it a snap.  No manual SSH install needed, no configuration, its just done–”and you can’t shake a stick a that”.

Check out the webinar tomorrow, February 10th, 2009 at 11:00 AM PST.  I’ll be there.. I hope you will too

Blogged with the Flock Browser

It’s time for a Boxee-ing match with Splunk!

And now for something completely different! In working with some interesting data generated by Boxee media center software, I found that we could use Splunk as a “Ratings Reporting Engine”.  Additionally, as Boxee is open source, I thought it might be handy to give their developers realtime access to my log data as its being generated.  

http://upload.wikimedia.org/wikipedia/commons/0/03/Boxee.png
Background:  Boxee is a cool, open source, media center software package that runs on AppleTV, Linux, MacOS X & windows (soon).  Allows you to watch movies, internet video content, even Netflix.  Boxee itself generates some interesting log data.  Boxee also allows for a viewer to automatically send a message to Twitter when a program is being viewed.

What could we do with this?:

Using Boxee’s own Logs:

  •   Detect errors so that the developers can see them live
  •   Calculate viewing duration on a local Boxee instance and some other cool reports

Using information in Twitter:

  •   Create reports that show most watched shows and most active users

     View all of this live right now in my public Splunk server

  
Local Boxee Logs 

In my setup, I have Boxee running on AppleTV.  Splunk is also running on AppleTV.  Splunk monitors data and forwards its logs up to my public Splunk server over TCP. Send yours up if you want!.  When I looked at the Boxee log data in Splunk there were a few events that piqued my interest

When the “DVDplayer” program opens a file to be viewed, it records and event and the same goes when it closes a file.  Hmm.. Makes me think that using Splunk’s “Transaction” search operator, I could tie them together, AND, calculate the duration of viewing.  Smells kinda fun.  How does that work?
 
Here’s the search command that’ll make this one work:


dvdplayer (opening OR “closing video”) NOT SQLite | rex “Downloads\/Boxee\/(?<title>[^\/]+)\/” | rex “Movies\/(?<title>[^\.]+)\.” | rex “file\/get\/(?<title>[^\.]+)\.” | transaction startswith=”eventtype=\”open-movie\”" endswith=”eventtype=\”movie-closing\”" maxpause=-1 maxspan=-1 | eval duration = duration / 60 | timechart max(duration) by title usenull=f

 
Prerequisites

  1. Create event type called “open-movie” for any events that match this search:  “search = dvdplayer opening”
  2. Create event type called “movie-closing” for any events that match this search: “search = dvdplayer closing NOT audio“

  

In English it is“Find the dvdplayer opening or closing events, and get rid of the ones that have SQL Lite in them, because there are some errors happening (pipe to rex) to extract the title of the program from the filename (pipe to rex) to get more program titles because I have movies in two different directories (yeah, you can overload a field) - (then pipe it to “transaction”), define the transaction as beginning with the even type “open-movie” and ending with the eventtype “movie-closing”, setting the pause and span  as “-1” so built in rules don’t get in the way.  Transaction will create a duration (showing number of seconds), we’d better divide that by 60 so we can get it in “minute resolution”, and then (pipe to “timechart) to look at the maximum duration viewed by title.    This way, we’ll know what movies are popular locally—even if they’re watched multiple times.  (Breathe, you weren’t supposed to repeat that whole paragraph in one breath!)

Additionally, I created some reports that will allow the open source developers of Boxee to look at “Where the errors are coming from”.  I extract some info from the events

error OR failed OR severe | rex “ERROR: (?<error_source>[^\:]+)\:” | rex “ERROR: \[(?<error_source>[^\]]+)\]” | top limit=10 error_source

 
In english it is, find errors (pipe to rex) to create a field called “error_source” (do it again because there are two types of errors in boxee), then (pipe) to a top graph by error source, and then save it to dashboard, but display as “TABLE”.  Kinda handy so the devs can see that
most of my errors come from some “CGUIBoxeeViewState” Objects. 
The SQLite errors are also quite annoying.

Boxee Data on Twitter

If you’re asking yourself “what’s Twitter”, you are clearly not hip enough to be using “rex” or “transaction”.  Assuming you already know what it is, I’ll bet you didn’t know Twitter has a search engine (They bought from Summize).  Twitter Search indexes all “Tweets” and lets you retrieve results.  Why.. Well if you don’t listen to what people are saying publicly, should should start!  What are people saying about Splunk right now? See, that’s why Twitter is so valuable.. Not the “I’m sitting down to have Sabra with Amrit & David” posts most people do).

You can setup Boxee to “tweet” what you’re watching, and when you do–this happens:.  

 A message like this is posted to Twitter:   “jlarkins: watching Inherit the Halibut on Boxee” - about 1 hour ago.    

 
Pretty simple, and they’re all like that. Every message has the word “watching” followed by the title, followed by “on Boxee”. It also has a timestamp as well–which Splunk really likes.  If we run a search on twitter and ask it for “watching * on boxee”, we should get nearly all of those messages.  Notice in the upper right of the Twitter Search page,  there’s a“feed for this query” link.   If we run this search http://search.twitter.com/search.atom?q=watching+*+on+boxee  we’ll get back an ATOM feed which is like RSS but technically better. (Follow me kids, this is going somewhere cool).  

 The results of that search yield an Atom feed with XML for every Twitter message that looks like this:

<entry>
<id>tag:search.twitter.com,2005:1082935045</id>
<published>2008-12-28T22:50:05Z</published>
<link type=”text/html” rel=”alternate” href=”http://twitter.com/kiranboxee/statuses/1082935045″/>
<title>watching The Onion Movie on Boxee. check it out at “>http://www.imdb.com/title/tt0392878</title> 
<content type=”html”>&lt;b&gt;watching&lt;/b&gt; The Onion Movie &lt;b&gt;on&lt;/b&gt; &lt;b&gt;Boxee&lt;/b&gt;. check it out at &lt; a href=”">http://www.imdb.com/title/tt0392878″&gt;http://www.imdb.com/title/tt0392878&lt;/a&gt;</content> 
<updated>2008-12-28T22:50:05Z</updated>
<link type=”image/png” rel=”image” href=”http://static.twitter.com/images/default_profile_normal.png”/   >
<author>
<name>kiranboxee (kiranboxee)</name>
<uri>http://twitter.com/kiranboxee</uri> 

</author>

</entry>

 Look at all that data, there’s the “author’s name”, there’s a timestamp, there’s the Title of the movie as well.. Or rather there’s that “watching The Onion Movie on Boxee” message in there.

Splunk Comes In Handy
Indexing that stuff: Using Erik Swan’s “Web Page Monitor (webping)” application on SplunkBase, I’ve configured my Splunk server eat the output of this URL http://search.twitter.com/search.atom?q=watching+*+on+boxee   .  I have it setup to ping that URL every 300 seconds (5 min).  Since Twitter search is only going to give me back about a page full of results, and those results change a lot, I decided every 5 minutes was fine — it turns out that might be too frequent—you’ll see why soon.  I did have to configure props.conf to know where to break events (BREAK_ONLY_BEFORE=\<entry\>), but once I had that done, my XML/RSS events that show each Twitter post on movie viewing was indexed by splunk.  If you didn’t know, we have a python search operator called “xmlkv” which will actually take those XML elements and turn them in to fields—for my purposes, I won’t be using that operator.

Searching -  If we run the search “source=”http://search.twitter.com/search.atom?q=watching+*+on+boxee” over a 7 day period we get way more than 50k results. Why, because we’re indexing a search engine, and there’s a chance we have a lot of duplicates in there (if I back off my ping time, I might have less).  

Sidebar:  every Twitter message has a unique number & URL for it. Look up there.. See “href” item in the “link” element–that’s it.

 Another Splunk search operator you probably didn’t know about is called “dedup” which will take search results and de-duplicate them based on the contents of a field.  This search:

source=”http://search.twitter.com/search.atom?q=watching+*+on+boxee” | dedup href
 

Yields only 321 unique results in the past 7 days… That’s more like it!.  By using some field extraction with multiline regex searching, we’re pulling out “username” and “title” and then graphing them.

Boxee Rating Reporting

In my Splunk server I have a “Boxee” dashboard, consisting of a few saved searches that reveal statistics about user activity gleaned from Twitter.  Check in from time to time, and you may see more.

Top programs viewed in past 7 days - via Twitter:  source=”http://search.twitter.com/search.atom?q=watching+*+on+boxee” | dedup href | timechart count(title) by title useother=f usenull=f

Top 10 Viewers in the past 7 days - via Twitter:  source=”http://search.twitter.com/search.atom?q=watching+*+on+boxee” | dedup href | top limit=10 username

If you hadn’t figured out, I’m a pretty big fan of Splunk. Its just so darn useful versus alot of other tools that deal with IT data.
 
So what did we learn (other than Wilde uses Twitter), ok seriously what did we learn:

Splunk Search language commands

  1. Transaction
  2. Dedup
  3. Timechart count
  4. Timechart max
  5. Eval

Splunk Applications:

  • Web Page Monitor (Webping)
    It appears, in my application of webping, I probably could backoff my ping time to like once an hour because I have a lot of dupes.

Do something cool with Splunk.  It causes you to read the docs, learn stuff you didn’t think you needed to know.  Got questions, let me know–I’m happy to help.

Disclaimer:  In regards to what may appear as the viewing of copyrighted material, any and all names, characters, places, locations, locales, business establishments, organizations, associations, groups, entities, dominions, states, nations, governments, beliefs, circumstances, conditions, and events portrayed in this story, text, writing, symbol, image, or illustration are either fictitious or fictitiously used. Any resemblance to real or actual persons (living or dead) are pure coincidence. Any resemblance to real or actual character, characters, place, places, location, locations, locale, locales, business establishment, business establishments, organization, organizations, association, associations, group, groups, entity, entities, dominion, dominions, state, states, nation, nations, government, governments, belief, beliefs, circumstance, circumstances, condition, conditions, event, or events that exist, exists, existed, have existed, or will exist are pure coincidence. Any resemblance to reality is pure coincidence.

Blogged with the Flock Browser

It’s time for a Boxee-ing match with Splunk!

And now for something completely different! In working with some interesting data generated by Boxee media center software, I found that we could use Splunk as a “Ratings Reporting Engine”.  Additionally, as Boxee is open source, I thought it might be handy to give their developers realtime access to my log data as its being generated.  

http://upload.wikimedia.org/wikipedia/commons/0/03/Boxee.png
Background:  Boxee is a cool, open source, media center software package that runs on AppleTV, Linux, MacOS X & windows (soon).  Allows you to watch movies, internet video content, even Netflix.  Boxee itself generates some interesting log data.  Boxee also allows for a viewer to automatically send a message to Twitter when a program is being viewed.

What could we do with this?:

Using Boxee’s own Logs:

  •   Detect errors so that the developers can see them live
  •   Calculate viewing duration on a local Boxee instance and some other cool reports

Using information in Twitter:

  •   Create reports that show most watched shows and most active users

     View all of this live right now in my public Splunk server

  
Local Boxee Logs 

In my setup, I have Boxee running on AppleTV.  Splunk is also running on AppleTV.  Splunk monitors data and forwards its logs up to my public Splunk server over TCP. Send yours up if you want!.  When I looked at the Boxee log data in Splunk there were a few events that piqued my interest

When the “DVDplayer” program opens a file to be viewed, it records and event and the same goes when it closes a file.  Hmm.. Makes me think that using Splunk’s “Transaction” search operator, I could tie them together, AND, calculate the duration of viewing.  Smells kinda fun.  How does that work?
 
Here’s the search command that’ll make this one work:


dvdplayer (opening OR “closing video”) NOT SQLite | rex “Downloads\/Boxee\/(?<title>[^\/]+)\/” | rex “Movies\/(?<title>[^\.]+)\.” | rex “file\/get\/(?<title>[^\.]+)\.” | transaction startswith=”eventtype=\”open-movie\”" endswith=”eventtype=\”movie-closing\”" maxpause=-1 maxspan=-1 | eval duration = duration / 60 | timechart max(duration) by title usenull=f

 
Prerequisites

  1. Create event type called “open-movie” for any events that match this search:  “search = dvdplayer opening”
  2. Create event type called “movie-closing” for any events that match this search: “search = dvdplayer closing NOT audio“

  

In English it is“Find the dvdplayer opening or closing events, and get rid of the ones that have SQL Lite in them, because there are some errors happening (pipe to rex) to extract the title of the program from the filename (pipe to rex) to get more program titles because I have movies in two different directories (yeah, you can overload a field) - (then pipe it to “transaction”), define the transaction as beginning with the even type “open-movie” and ending with the eventtype “movie-closing”, setting the pause and span  as “-1” so built in rules don’t get in the way.  Transaction will create a duration (showing number of seconds), we’d better divide that by 60 so we can get it in “minute resolution”, and then (pipe to “timechart) to look at the maximum duration viewed by title.    This way, we’ll know what movies are popular locally—even if they’re watched multiple times.  (Breathe, you weren’t supposed to repeat that whole paragraph in one breath!)

Additionally, I created some reports that will allow the open source developers of Boxee to look at “Where the errors are coming from”.  I extract some info from the events

error OR failed OR severe | rex “ERROR: (?<error_source>[^\:]+)\:” | rex “ERROR: \[(?<error_source>[^\]]+)\]” | top limit=10 error_source

 
In english it is, find errors (pipe to rex) to create a field called “error_source” (do it again because there are two types of errors in boxee), then (pipe) to a top graph by error source, and then save it to dashboard, but display as “TABLE”.  Kinda handy so the devs can see that
most of my errors come from some “CGUIBoxeeViewState” Objects. 
The SQLite errors are also quite annoying.

Boxee Data on Twitter

If you’re asking yourself “what’s Twitter”, you are clearly not hip enough to be using “rex” or “transaction”.  Assuming you already know what it is, I’ll bet you didn’t know Twitter has a search engine (They bought from Summize).  Twitter Search indexes all “Tweets” and lets you retrieve results.  Why.. Well if you don’t listen to what people are saying publicly, should should start!  What are people saying about Splunk right now? See, that’s why Twitter is so valuable.. Not the “I’m sitting down to have Sabra with Amrit & David” posts most people do).

You can setup Boxee to “tweet” what you’re watching, and when you do–this happens:.  

 A message like this is posted to Twitter:   “jlarkins: watching Inherit the Halibut on Boxee” - about 1 hour ago.    

 
Pretty simple, and they’re all like that. Every message has the word “watching” followed by the title, followed by “on Boxee”. It also has a timestamp as well–which Splunk really likes.  If we run a search on twitter and ask it for “watching * on boxee”, we should get nearly all of those messages.  Notice in the upper right of the Twitter Search page,  there’s a“feed for this query” link.   If we run this search http://search.twitter.com/search.atom?q=watching+*+on+boxee  we’ll get back an ATOM feed which is like RSS but technically better. (Follow me kids, this is going somewhere cool).  

 The results of that search yield an Atom feed with XML for every Twitter message that looks like this:

<entry>
<id>tag:search.twitter.com,2005:1082935045</id>
<published>2008-12-28T22:50:05Z</published>
<link type=”text/html” rel=”alternate” href=”http://twitter.com/kiranboxee/statuses/1082935045″/>
<title>watching The Onion Movie on Boxee. check it out at “>http://www.imdb.com/title/tt0392878</title> 
<content type=”html”>&lt;b&gt;watching&lt;/b&gt; The Onion Movie &lt;b&gt;on&lt;/b&gt; &lt;b&gt;Boxee&lt;/b&gt;. check it out at &lt; a href=”">http://www.imdb.com/title/tt0392878″&gt;http://www.imdb.com/title/tt0392878&lt;/a&gt;</content> 
<updated>2008-12-28T22:50:05Z</updated>
<link type=”image/png” rel=”image” href=”http://static.twitter.com/images/default_profile_normal.png”/   >
<author>
<name>kiranboxee (kiranboxee)</name>
<uri>http://twitter.com/kiranboxee</uri> 

</author>

</entry>

 Look at all that data, there’s the “author’s name”, there’s a timestamp, there’s the Title of the movie as well.. Or rather there’s that “watching The Onion Movie on Boxee” message in there.

Splunk Comes In Handy
Indexing that stuff: Using Erik Swan’s “Web Page Monitor (webping)” application on SplunkBase, I’ve configured my Splunk server eat the output of this URL http://search.twitter.com/search.atom?q=watching+*+on+boxee   .  I have it setup to ping that URL every 300 seconds (5 min).  Since Twitter search is only going to give me back about a page full of results, and those results change a lot, I decided every 5 minutes was fine — it turns out that might be too frequent—you’ll see why soon.  I did have to configure props.conf to know where to break events (BREAK_ONLY_BEFORE=\<entry\>), but once I had that done, my XML/RSS events that show each Twitter post on movie viewing was indexed by splunk.  If you didn’t know, we have a python search operator called “xmlkv” which will actually take those XML elements and turn them in to fields—for my purposes, I won’t be using that operator.

Searching -  If we run the search “source=”http://search.twitter.com/search.atom?q=watching+*+on+boxee” over a 7 day period we get way more than 50k results. Why, because we’re indexing a search engine, and there’s a chance we have a lot of duplicates in there (if I back off my ping time, I might have less).  

Sidebar:  every Twitter message has a unique number & URL for it. Look up there.. See “href” item in the “link” element–that’s it.

 Another Splunk search operator you probably didn’t know about is called “dedup” which will take search results and de-duplicate them based on the contents of a field.  This search:

source=”http://search.twitter.com/search.atom?q=watching+*+on+boxee” | dedup href
 

Yields only 321 unique results in the past 7 days… That’s more like it!.  By using some field extraction with multiline regex searching, we’re pulling out “username” and “title” and then graphing them.

Boxee Rating Reporting

In my Splunk server I have a “Boxee” dashboard, consisting of a few saved searches that reveal statistics about user activity gleaned from Twitter.  Check in from time to time, and you may see more.

Top programs viewed in past 7 days - via Twitter:  source=”http://search.twitter.com/search.atom?q=watching+*+on+boxee” | dedup href | timechart count(title) by title useother=f usenull=f

Top 10 Viewers in the past 7 days - via Twitter:  source=”http://search.twitter.com/search.atom?q=watching+*+on+boxee” | dedup href | top limit=10 username

If you hadn’t figured out, I’m a pretty big fan of Splunk. Its just so darn useful versus alot of other tools that deal with IT data.
 
So what did we learn (other than Wilde uses Twitter), ok seriously what did we learn:

Splunk Search language commands

  1. Transaction
  2. Dedup
  3. Timechart count
  4. Timechart max
  5. Eval

Splunk Applications:

  • Web Page Monitor (Webping)
    It appears, in my application of webping, I probably could backoff my ping time to like once an hour because I have a lot of dupes.

Do something cool with Splunk.  It causes you to read the docs, learn stuff you didn’t think you needed to know.  Got questions, let me know–I’m happy to help.

Disclaimer:  In regards to what may appear as the viewing of copyrighted material, any and all names, characters, places, locations, locales, business establishments, organizations, associations, groups, entities, dominions, states, nations, governments, beliefs, circumstances, conditions, and events portrayed in this story, text, writing, symbol, image, or illustration are either fictitious or fictitiously used. Any resemblance to real or actual persons (living or dead) are pure coincidence. Any resemblance to real or actual character, characters, place, places, location, locations, locale, locales, business establishment, business establishments, organization, organizations, association, associations, group, groups, entity, entities, dominion, dominions, state, states, nation, nations, government, governments, belief, beliefs, circumstance, circumstances, condition, conditions, event, or events that exist, exists, existed, have existed, or will exist are pure coincidence. Any resemblance to reality is pure coincidence.

Blogged with the Flock Browser

Splunk Ninja - EVENTually I will be TYPEcast

Welcome to another episode of Splunk Ninja.  I received and email from a customer yesterday indicating they wanted a better way to deal with “noise” in their logs.  For this customer, filtering out events prior to them being indexed was not the answer–they need to retain every event, but not necessarily deal with them.

It brought me to a component of Splunk’s technology, that in my unscientfic survey, not too many customers use very often.  Event Types.  While you can read all about them in our documentation, I figured i’d give you my thoughts, explain them in terms that I myself can understand.  You’ll see a few examples of how to locate and create event types using the “punct” field attached to every event.  Additionally we’ll cover how cool the “typelearner”, or “Discover Event Types” feature is.

There’s a lot you don’t know about in your log data, and event types and the typelearner can help focus your vision in to your IT data.  Comments welcome as always.  T-shirts to all commenters!

Update: Here’s some advice from David Carasso, father of crawl, eventtypes, and lots of other cool learning technology at Splunk.

  1. 1. Consider tagging these boring eventtypes as “boring”. and then filter results by “NOT eventtypetag=boring”.
  2. 2. Finally, when making eventtypes, it’s always a good idea to make the search as generic as possible, while still getting just the events you want.  if you can avoid sourcetypes, punctuation, and extracted fields, your eventtype is easier to share, in that you don’t have to also share your props.conf, sourcetypes.conf, and transforms.conf, but maybe that ’s a minor issue.

Update:  According to Splunk lore, taken from the historical archives, safely guarded by the Knights of the Splunk Templar, David Carasso may in fact also be the father of “the search language, transaction search, sourcetype classifier, timestamping, multiline event splitting, and the phrase, “take the sh out of it”

Blogged with the Flock Browser

Tags: ,

Got Salesforce, Got Mac.. need help. Here you go!

Since we’ve recently switched over to Salesforce.com, which I’m pretty satisfied with.  I’ve been searching for tools that help me interact with Salesforce.com via the software on my Mac, such as “Microsoft Entourage 2008″ and my favorite application “Quicksilver

Simon Fell, over at PocketSoap.com has created a bunch of tools for the Mac user that help integrate Salesforce.com with the stuff you do locally on your Mac.

Maildrop

Maildrop is pretty cool, because it logs you in, and integrates with Entourage.  It adds a special menu that provides functionality for Notes, Events, Cases, Contacts, and Email.   Most importantly, this video will show you how to setup Maildrop, how to use it, and how Entourage can work with Salesforce.com




Salesforce Plugin for Quicksilver

Simon Fell’s Salesforce.com Plugin for Quicksilver is also a pretty sweet add-on for Quicksilver. If you don’t know how awesome Quicksilver is, watch my productivity video. This plugin allows search in Salesforce.com directly from Quicksilver and file upload to your documents folder! All with a few key clicks–per usual in Quicksilver. This video will show you how to setup the plugin for Quicksilver and a bit about how it works.



The Successforce dudes that own the Twitter account also sent me a link to other Mac tools for Salesforce.com up on their site (where Maildrop is also featured)

Blogged with the Flock Browser

Tags:

All My Regex’s Live in Texas

Put down that O’Reilly book about RegEx, quit googling, and saddle up!  Ninja’s going Texas style today with a new video on Regular Expressions, or REGEX.   Since Splunk is the ultimate swiss army knife for IT, or rather the “belt” in “blackbelt”, I wanted to share with you how I learned about Regex and some powerful ways to use it in your Splunk server.

I did have an O’Reilly book on Regex, and I have spent a great deal of time on the web looking up how to do regex. Still, I like the easy way, and since i’m a visual guy–to no surprise–I have found some great tools that help me.  RegexBuddy by JGSoft and Reggy (free on Google Code).  RegexBuddy will teach you Regex better than anything else, and Reggy is your shuriken.

Using those tools to help me develop a proper RegEx, I can take what i’ve learned and apply it in Splunk.  By no means is being a ninja required to use Splunk, any IT person worth their salt has some special tools and talents they employ to take software products like Splunk to the next level.



This video will break it all down for you and should give you a few advanced ways to use Splunk that I’ll bet you didn’t know about.

By the way, not only did I never think I’d live in Texas, how the heck did I end up parodying a song title by George Strait.  If you don’t get it. Listen to the song.   




Shout out to the ninjas at University of Texas, Austin who dig Splunk!.   Splunk ‘em Horns!






Update:  “@shadejinx” on Twitter asked.. “Can you extract multiple fields with the Rex format”? 
Answer:  Of course you can.. guess how?  Think for a bit (this is how i figured this out)… … … aha!  just add another ” | rex” at the end of that search.  In the video above, this scenario is presented:

Event:

   :: ... :   ...  :::::  ...7

In the video example, i’d like to extract the DHCPACK (and other variations) and create a field called “DHCP_ACTION”, so this search is ran:

Search:

source=”/mnt/log/splunk-interop/2008-lv-messages” dhcpd via | rex “dhcpd:\s(?<dhcp_action>\S+)”
But what if, in the same search I wanted to extract that final IP address, being the device by which we requested the IP address.  Lets call it “dhcp_subnet_host” Easy, the Splunk search language works as you’d expect it to.  Try this:

Search:

source=”/mnt/log/splunk-interop/2008-lv-messages” dhcpd via | rex “dhcpd:\s(?<dhcp_action>\S+)” | rex “via (?<dhcp_subnet_host>\S+)”

The result is, in the same search, I’m able to extract two fields, especially if i have some variance on where that subnet_host is.  By doing it this way, I don’t have to write the “mother of all regex’s” to come up with the perfect match–just string searches together and you’re ropin’ cattle.. or log events.!

Blogged with the Flock Browser

Splunkin at Amazon Start-Up

Today, http://splunk.tv is live at Amazon Start-Up at the Austin Music Hall.  Tune in, the SplunkNinja will be talking about what we’ve been doing with Amazon’s Web Services in a number of capacities.  This will be recorded, so if you can’t make it–tune in later.  3:10 PM CST.

Update:  The recorded video from yesterday’s presentation at Amazon Startup is here:

http://www.ustream.tv/recorded/704929

Note:  There’s about 13 minutes of delay… sorry, so fast forward to about 13:30 and you’re good

Blogged with the Flock Browser

Tags:

Caught on tape! Splunk Ninja vs. Sciencelogic Special Forces

A few weeks ago, Louis DiMeglio and I did a “quasi-podcast-ish” Q and A session discussing experiences at this year’s Interop shows (Las Vegas in May, and the upcoming New York show in September).   This session is over on Sciencelogic’s blog, check it out–we tried really hard to edit the audio well–who knows we may have to turn this in to a frequent podcast.

Louis DiMeglio heads up the Sales Engineering team at Sciencelogic.

EM7, their flagship product is a pretty cool all-in-one integrated management appliance that that works hand-in-hand with Splunk live at Interop. Come check out the NOC we’re building for the September 2008 version of Interop.  It is THE largest IT tradeshow on the planet, the one you’re likely to find people who know what they’re talking about, and a crew of “rag-tag” vendors that get together,

build a real NOC with all sorts of different products and actually make it all work.

Interestingly, we NOC volunteers go through the same challenges that IT guys who do real work deal with every day.  I look it at 
as sort of a reality-check for vendors of IT products, and a really neat 
experience to geek out for about a week twice a year on a real production network.

Yeah, catchy post title, I know.  Other than the new deodorant I’m trying this week, I had to do something to lure the millions of readers in to hear the golden voice of the Splunk Ninja.

And yes, I do wear Heely’s shoes when I go to tradeshows, and no, I’m not 40, yet…

Listen to the podcast!

Blogged with the Flock Browser

Tags: , , ,

Splunk Ninja - So You’re Interested in Video now?

This episode gives our faithful and inquisitive viewers a behind-the-scenes look at the Splunk Ninja’s ghetto-tech operation. Some viewers have been wondering, how I put all of these videos together, what equipment to use and what software or websites to get started with.  Covered in this no-holds-barred, blockbuster epic, multi-dollar budgeted, long form tutorial are:

  • My experiences in getting to this point.
  • Things for you to consider and many options.
  • Tools I use in my “anti-studio”.
  • Production, hosting, viewing and all that nonsense.

Its the longest video I’ve ever done. I really try to put content in front of the viewers that has substance, some level of staying power, relevance and most of all value for your attention–which I do cherish

Thanks for watching, please comment in the timeline, with your keyboard or with the Seesmic video comment link below.  …and one more thing, send me a link to your videos!

This blog post and video is in fact not sponsored by Behringer Mic’s, Alesis Mixers, John Foley Software, Vara Software, AllocInit.com, Viddler–I’m just a big fan of their stuff!  — but is in fact sponsored by Splunk, The IT Search Engine.  Download it today.. it rox!

Blogged with the Flock Browser

Tags: , , , , , , , ,

Next Page »