Splunk, Developers, and SOA Apps

When most people first come across Splunk, the first set of users associated with it naturally become operations, security, or compliance personnel. Splunk naturally lends itself for their use. I was speaking to some software engineers explaining what Splunk does and the connection for how it could be used for their engineered Service Oriented Architecture applications did not come immediately. I told them that one of Splunk’s T-Shirts reads “Be an IT Superhero. Go Home Early.” At that point, I got their interest.

Let’s get back to the basics for one of the reasons Splunk exists, which applies to not only SOA, but also to all phases of multi-tier deployment. The typical developer may be involved in multiple stages of SOA development that produces applications and services residing on multiple physical servers. When something goes wrong on any of these servers, the developer may get called to investigate, but for reasons of security, are not given access to these servers. So, our friendly neighborhood developer, next calls someone in operations, who zips up relevant log and trace files to send to the developer via an FTP server. The next steps involve getting the files, unzipping them, and running various home grown scripts which usually have some derivative of Perl, Awk, and SED, to search for issues. If the results are not available for this server or it turns out another server is the culprit for the issue, the whole process is repeated and can take a while to accomplish.

Along comes Splunk to automate this whole effort and make IT search as easy as using a browser based search engine. Splunk Light Weight Forwarders (LWF) are installed on every leg of the SOA process to monitor application produced data. Each forwarder sends events to Splunk indexers in a Splunk controlled automatic load balanced manner. A separate Splunk server called a Search Head, which is essentially a Splunk indexer that does not index, but participates in a distributed search, is used by the developer to find the issue. Each event has a timestamp, host it came from, source file name, and a classification called sourcetype to narrow down the search. In a matter of minutes, issues can be tracked down, for what used to take hours. A sample Splunk deployment for this set up is below.
Distributed Search

In this example, we have forwarders for an application server, a service bus, and a BPM product. This is just one example as a SOA tier could just as easily have been a web portal or MQ Series. For completeness, we also have Firewall data being forwarded. However, Splunk role base access can restrict what the developer can see and do. For instance, all application data can be put into a separate index called application and the developer can only search for data where index=application. Further restrictions such as originating host or sourcetype can also be applied to the role.

For one technical note, Splunk’s LWF are indeed light weight in that they purposely restrict the amount of network bandwidth they consume to send data to an indexer to a maximum default of 256 KBps. If you want to increase or decrease this maximum data rate, copy SPLUNK_HOME/etc/apps/SplunkLightForwarder/default/limits.conf to SPLUNK_HOME/etc/apps/SplunkLightForwarder/local and change the settings in limits.conf.

There, you have it. Software developers who are constantly called upon to troubleshoot issues in production systems and SOA deployment can go home earlier as they could have role based access to data in their area of expertise. To make this even further compelling, Splunk can also be used to monitor and alert on additions, changes, and deletions in the file system to speed up these types of investigations. This combination should help create IT Superheros.

*************************

On an administrative note, in the past, I have written blog entries on various topics such as using JavaMail with Splunk or correlating with database records. For these entries I provided links to examples and applications that covered the topics. These have all been moved to the new Splunk Community Apps page.

40 Days of 4.0: Enriching Data with DB Lookups (Part 2)

Today, I’m writing as a guest blogger for Bob Fox to create part 2 of enriching data with the Splunk lookup command. Bob had already created part 1, which describes in detail with an example how to use the lookup command to enrich data from external CSV files. Today’s topic builds on the lookup command usage showing how to enrich indexed data at search time using an external database.

To begin with, it is a fact of life that some event data or log data may not reside in files, may not be broadcast on network ports, or even be imported uniquely via a scripted input. This data may, for legacy reasons, reside in a database. The often cited use case in this scenario is that the user would like to correlate some data that resides in Splunk as indexed events with similar fields that reside in a database. Even if a scripted input can be used to uniquely capture the data within the database and have it indexed within Splunk, there exists the issue of having redundant data that has been indexed twice only for the sake of heterogeneous correlation (join between a field in indexed data within Splunk and a field within data located in the database). Some people may not desire to index data once within a database and again elsewhere via extraction methods that end up taking disk space in the secondary index.

Examples of use cases where data resides within Splunk and related data resides within an external database are easily found. For instance, there may exist a security use case where an investigator is looking at events in Splunk and finds that a particular user has done something questionable. One thing the investigator may want to do is find the user’s address location and phone number that resides in a relational database. Using search within Splunk to quickly get to this database data is useful. Another example could be that a proprietary system logs all its access data including IP source addresses that are being used within a relational database, while at the same time the company has firewall data being indexed within Splunk. A correlation between the two types of data within Splunk using the IP address as the common key should be possible. With these types of correlation in mind, I’ll go over the steps for setting up an example and provide a link to download it.

First, decide what field within indexed data within Splunk is going to be used to correlate and enrich data with an outside source. In my example, I have weather data and the field that I want to use is the city field within weather data. For purposes of illustration, my data looks like this:

Jul 27 08:35:09...city=Nice...

Splunk will automatically extract the city field with the value Nice at search time. What I’m interested is finding the country location of a city using an external database for the correlation. Again, for simplicity, I’ll use a terse database table.

city country
Nice France
Cambridge UK

Now, let’s move on to the Splunk setup. You’ll need to add an entry for the lookup to your prop.conf configuration file just like in part 1. Mine looks like this:

[weather_entry]
lookup_table = countrylookup city OUTPUT country

Next, you’ll need to define what countrylookup does in your transforms.conf file. In this case, it will call an external Python program.

[countrylookup]
external_cmd = countrylookup.py city country
external_type = python
fields_list = city, country

The external command that you write, countrylookup.py, should reside in the bin directory of your application. The city and country terms next to it in the configuration file are the input field name headers used to produce a dynamic CSV table that is sent to a Python CSV standard output writer. The Python program gets its city field input via standard CSV input from Splunk, calls SQL to find the corresponding country, and produces the aggregate CSV output that contains the city with its correlated country. The complete example with instructions on set up can be downloaded from Splunkbase. My example uses the MySQL database, but you are free to change the code to use whatever database you require as long as there exists in this case a Python module to access the database. The final touch is to show you what the Splunk search looks like to get the new country field for my example.

sourcetype="weather_entry" city=Nice |xmlkv| lookup countrylookup city OUTPUT country

This will return France in a new field called country. There are a few design considerations that need to be addressed before I conclude today’s entry.

  • Use a database index in your DB on the field that is being used to correlate between Splunk and the database.
  • Have the Python program connect to a long running program (application server) that maintains a connection pool to the database to avoid having to reconnect to the database on each invocation of the lookup command.
  • If you know beforehand the number of uniquely matched events in the database will only be in the few hundreds, such as number of unique cities in my case, consider building an internal cache to avoid having to access the database for each corresponding select call. Splunk’s iplocation command does this and the source code for iplocation.py is included in your download of Splunk.

To wrap up, although we didn’t discuss the user written Python program to do the lookup in detail, the sourcecode for it is part of the Splunkbase.com download to provide you with one example on how it can be written. The Splunk distribution also ships with external_lookup.py, which has a similar structure for taking CSV input from Splunk via standard input and producing CSV standard output.  I hope today’s entry is useful for these types of use cases.

Using File Contents as Input for Search

I’ve been asked a few times on how best to search for events which may  contain many different discrete values for a field. It’s essentially using an OR (disjunctive search) in the search language. For example, you can do this:

sourcetype=my_sourcetype (planet=mars OR planet=earth OR planet=saturn)

This works fine for a finite case where you only have a handful of planets, but what happens if the field’s possible search criteria changes daily and may contain hundreds of possible values that you would like to input for the search? Certainly, using OR terms with over a hundred entries sounds impractical. A solution is to have an external file that contains all the possible values that you would like to use in the disjunctive search be used within the search language as input to the search criteria. With Splunk 4.0, one way this is possible out of the box is with the new lookup command. For an introduction to this command, please consult Bob Fox’s blog entry discussing example usage. For now, I will assume you have basic knowledge about its usage and I will list a possible solution for trying to use OR with many possible values for a field.

First, use field extraction to extract the field in question. For our example I’ll use an ip address field. Next, create a CSV file in your SPLUNK_HOME/etc/app/<app_name>/lookups/ directory. I created iptable.csv with the following sample content to be used for input.

ip, myip
192.168.1.105, 192.168.1.105
10.10.10.2, 10.10.10.2
192.168.1.10, 192.168.1.10

Since I’m not interested in creating a real mapping from one field (ip) to another (myip), I used the same value in both columns to conform to the syntactical usage of the lookup command. Now, in your SPLUNK_HOME/etc/apps/<app_name>/local directory you’ll need to create or modify two files. First, edit transforms.conf.

[search_ip]
filename = iptable.csv

Second, edit props.conf and use your sourcetype to start the stanza. I am using mail as my sourcetype.

[mail]
lookup_ip = search_ip ip OUTPUT myip

Now, from your browser, log into Splunk and reload the props.conf and transforms.conf file for your new additions:

sourcetype=mail | extract reload=true

You are now ready to use your file as input to search for all events that contain ip addresses that were in your CSV file. One possible search is:

sourcetype=mail | lookup search_ip ip OUTPUT myip | search myip=*

The last search command will find all events that contain the given values of myip from the file. In essence, this last step will do your disjunctive search for you without having to type in a long sequence of OR terms. Finally, if your requirement is that you want to search on the top N (N is an integer) values for a field each day, Splunk can help you create the CSV input file. Simply run the following search assuming you want the top 100 values for IP in our example:

sourcetype=mail | top limit=100 ip | fields + ip

You can then copy and paste the the values into your CSV file. In short, today’s blog entry gave you one possible way to use the content of a file for input for your disjunctive search. There may be more approaches and you are welcome to discuss them in the comments.

Indexing and Searching RSS feeds

Many companies produce RSS (Really Simple Syndication) feeds for their employees, partners, and customers. Moreover, these same companies consume RSS feeds from their suppliers whether it be personal news information or more timely business data. RSS is a great way to digest this information, but after a certain period, it may not be possible to find it again. If information from a RSS feed were indexed on a regular basis, say every 10 minutes to 30 minutes, into Splunk it could be searched at anytime. To accomplish this, I’ve created a simple Splunk application to index some RSS metadata (date, title, link, and description) on Splunkbase. Simply download the application and install it into your $SPLUNK_HOME/etc/apps directory. Then, modify its inputs.conf file. For example:

[script://./bin/rss_sports.sh]
interval = 600
sourcetype = rssfeed
source = rss_sports
disabled = false

Next, create a script in the rss/bin directory that is called by the scripted input. A sample one has already been provided as follows:

#!/bin/sh

python $SPLUNK_HOME/etc/apps/Info/bin/rssfeed.py $SPLUNK_HOME/etc/apps/Info/bin/sports.txt

The script calls an already written Python script passing in one argument which contains a list of RSS feeds to index. Restart Splunk and look for your rssfeeds sourcetype. The RSS metadata has already been delimited by tag=”value” for automatic field extraction. The provided Python script calls open source feedparser to do the parsing of each RSS feed supplied to it. Since this is all script based and re-entrant code, you can provide multiple scripts in inputs.conf, each eventually calling rssfeed.py with its own set of feeds to simultaneously index multiple sources.

The next step is to search the Splunk for information within a feed. Here’s an example screenshot using Splunk 4.0.x.

Splunk Web showing RSS Content

As seen on the left, fields have automatically been extracted. You can even set up alert conditions such as search for:

sourcetype="rssfeed" title="*inflation*"

For this example, Splunk will provide an alert for any feed event that has inflation in its title. As you can see, this capability provides the Splunk user with a powerful way to create an information base on any subject for future search.

Using Splunk to Trace SOA Applications

I have mentioned in past blog entries that Splunk can be used to contribute to the governance and indexing of Service Oriented Architectures. In this post, I will discuss a more common issue that pertains to log management, operations support, and troubleshooting. In a typical SOA deployment, you may have a situation where a user logs into a web site for procurement or purchasing, which kicks off a series of steps handled by different servers using heterogeneous technologies. One flow may include a web server, which initiates the request and sends a message to an application server. The application server then sends a message to an Enterprise Service Bus (ESB), which in turn, routes the message to a Business Process Management (BPM) solution.  The diagram below illustrates this basic flow.

SOA Flow

The complexity begins as soon as something goes wrong in the flow as each node in the SOA may represent a cluster and there may be multiple log files being generated to record what has occurred. Along comes Splunk to index all the log files using forwarders to send events to a central indexer. At this point, the user would have access to log events without having to log onto any production servers.

To make the situation more complex, what if you wanted to now trace the flow of all users at a certain point in time and correlate what each user’s session was doing on each node of the SOA flow? Splunk’s transaction search can be utilized in the Splunk Web application to do this rather easily. For purposes of example, I am assuming that you already have an eventtype created called “SOA_Logs”, which is just a search that includes all the different sourcetypes for SOA log files. Also, the web server log file may at first have a session ID for the authenticated user, the application server may map this to an user ID and the rest of the nodes in the flow may use this user ID to identify the same user. You would use Splunk’s field extraction capability to extract these fields from your logs at search time. With these requirements, we could use a transaction search command to correlate all users for a certain time span within one search:

eventtype="SOA_Logs"  | transaction fields="session_id,use_id” connected=f maxspan=5m maxpause=5m

This search command will return groupings for all users with a session and user ID in a correlated manner, which follows the flow of the SOA. Each grouping will also give you a duration time so that you know how long an end to end flow took. Rather than go into the details for how transaction search works and the possible ways to use the above example, I invite you to read Eric “Maverick” Garner’s excellent blog entry discussing the steps in very readable language. What I’ve done is use the same example in the business context for troubleshooting SOA applications.

If you are already using Splunk for central log management in environments that are typical to this sample SOA flow, then out of the box, you will have this capability to trace your SOA applications to gain better visibility at the individual user level for events that have occurred. You can also pipe the results to a Splunk report command such as top. In summary, this approach can be valuable in troubleshooting complex deployments.

Using Splunk in a Screen Saver

Sometimes users of Splunk like to have Splunk tell them what is happening with their infrastructure without doing an ad-hoc search. The most obvious way to accomplish this is to use Splunk Alerts. An alert gets generated for a saved search that is executed over a configured period and matches user defined conditions.

Now suppose you want to visually just watch a saved search run on periodic basis. One approach would be to have the Splunk Web application in the browser auto refresh itself. If the requirement is that you would like this to appear full screen in real time for others to see without giving them any other access to your desktop computer (as you may be away), a possibility is to have the search run in a screen saver. I’ll explain one way to get this to work.

First, decide what searches you would like to run in a screen saver and test them out in a browser. Next, create a permalink to the search by using the pull down menu next to the Splunk search bar clicking on permalink. The URL will appear in the browser’s address bar and should be copied away to some documentation utility such as notepad in Windows. An example URL that has been “permalinked” by Splunk would be:

http://localhost:8000/?q=sourcetype%3D%22WinEventLog%3AApplication%22%20startminutesago%3D15&selStart=false&selEnd=false

Next, you’ll need to install a screen saver creation utility that allows web pages to be used as screens in a screen saver. For the purposes of testing, I’m using 2Flyer Screensaver Builder. All I did next was to use the saved URL above to create a web page for the screen saver and have it run every 30 seconds. This would allow me to execute a sequence of searches each being shown for 30 seconds at a time. After previewing the results, you can build the screen saver from the tool and you’ll get a screen such as below running from your screen saver.

Splunk Search in Screen Saver

Now, the next question is authentication. For the purposes of testing, I used the free edition of Splunk and didn’t have to deal with it. For the enterprise edition of Splunk, there is an application on Splunkbase called autologin that will allow automatic login into Splunk using a pre configured Splunk user and password. It is recommended to use an underprivileged user as your base user for security reasons. I got this working with Firefox as my default browser, but for some reason in IE, it had me go through one extra mouse click to accept the Splunk Certificate each time even though it had been added as an acceptable certificate and CA from the browser beforehand. Screen savers, by definition, don’t allow you to interact with them using mouse clicks as that would exit the screen saver. Since 2Flyer Screensaver Builder was based on the IE rendering engine, I didn’t try this any further.

In retrospect, I don’t recommend using an autologin feature to authenticate into Splunk as it does introduce a backdoor that you may not want, even if it is for an underprivileged user. A more acceptable approach would be to have the screensaver builder accept users and passwords to authenticate with any HTML form as part of building the screen saver. Overall, I write this blog entry to show you another interesting way to monitor activities in your operations center beyond traditional ad-hoc searches and alerts.

Audible Alerts

I was talking to some Splunk Users and mentioned scripted alerts as a very powerful way to invoke any program to get an alert. My thoughts then came to audible alerts. Since a scripted alert can call anything, it is possible that the script can call a program that can remotely send an alert that is audible, not just readable (like an email alert). I can think of a simple use case for this. Suppose you already have alerts that go to your cell phone via SMS through the email alert function of Splunk. Now, if you are at home and your cell phone battery is dead and it needs to be recharged, you may miss an important alert until you turn on your cell phone. As a back up, if an alert can go to some other device that is always on, such as a voice enabled device, you’ll have another opportunity to get the alert.

First, you’ll need to have a device that that can translate text to speech via a remote API. I chose to use a Nabaztag:Tag for this function. What’s this? It’s a voice enabled wifi rabbit that can receive multiple types of audible input including RSS, audio streams, and text to speech. What I did was set up a scripted input with environment variables on what to say, which included the name of the saved search, the number of events matched, and a readable subject. To make it more interesting, I added a day of the week (daily, weekdays, weekend) and start to end hours environment variables to control when the alert can be active. In a real life situation, you would want the alert to be active during your evening hours at home such as 6 PM to 11 PM. The script then calls a Python program that checks the time to be active, puts the final alert together as a String and then calls a Nabaztag REST based API to send the alert to the rabbit. The call to send the alert looks something like this:

http://api.nabaztag.com/vl/FR/api.jsp?sn=00039D4022DE&token=112231049046144&voice=UK-Shirley&tts=Splunk+Alert+...

The sn and token identify which rabbit to send the alert to, the voice identifies the accent and language, while the tts is the actual text to be read. When executed, the rabbit lights up and speaks the alert. It will also flash with color until a button is pressed which will again speak the alert in case you missed it the first time.

This example may sound a little playful as it uses a consumer gadget to serve the purpose. However, it illustrates that alerts do not always have to be textual in nature and can be as useful and creative as your imagination can conceive them. You can download the wifi rabbit example at Splunkbase and start using the same approach for your own audible device. Happy Belated Easter!

Nabaztag Rabbit

Change Management for SOA Configuration

In a previous blog entry, I had mentioned that Splunk can participate as a Services Oriented Architecture (SOA) consumer and provided an example on using web services as a scripted input. In today’s entry, I’ll discuss a more administrative task, which is quite native to Splunk, change management. As you may well know, Splunk can audit the file system and provide events on any change to a directory such as additions, updates, and deletes. Splunk can even monitor the files’ contents so that you can do a Splunk diff command to know exactly what has changed.

Now, what does this have to with SOA? In a typical SOA set up, there will exist a number of configuration files. At a minimum, you may have Web Services Definition Language (WSDL) files, XML files, and XML Schema (XSD) files stored on the file system of a production machine. It is important that any change that occurs to these files be authorized and monitored to provide control over deployment.

Let me introduce a simple use case. First, use Splunk’s fschange as a data input in your inputs.conf to monitor a directory. For example,


[fschange:/Applications/splunk/wsdl]
sourcetype = wsdl_monitor
index = testing
disabled = false
_blacklist = [~]$
_whitelist = \.wsdl$
recurse = true
sendEventMaxSize = -1
pollPeriod = 600
fullEvent = true

This stanza means I am monitoring my WSDL directory recursively for changes to WSDL files (excluding backup files ending with ~) every 600 seconds and I would like the full event (the entire file) to be stored within Splunk. Notice, I have an index = testing criteria, because the default is that index = _audit, which may not be where you want to place file system change events. You can also use index = main if you want your main index to have these events. Now, after I restart Splunk, I can start making changes to my configuration files and Splunk will tell us what has changed.

My search index=testing sourcetype=”wsdl_monitor” yields results such as:

   ::  =, =”///.“, =, =, =, =, =”   :: “, =”-–”, =, =”
source= | host= | sourcetype=
   ::  =, =”///.“, =, =, =, =, =”   :: “, =”-–”, =
source= | host= | sourcetype=
   ::  =, =”///.
source= | host= | sourcetype=

I can now tell who changed the file or directory and when it was done. Notice how action=add, action=update, and action=delete can be used as the basis to form event types which can later be used for Splunk Alerts. This means that changes to the SOA configuration can actively be monitored. Moreover, if a change was supposed to happen, and a search yielded no results for your action=update path=/Applications/splunk/wsdl/iptocountry.wsdl within the last 24 yours, the absence of a change may be worthy of an alert as someone forgot to update the WSDL.

An interesting next step is to see what changes were made to a configuration file to see if the changes were innocuous.  In my example, the search index=testing source=”/Applications/splunk/wsdl/iptocountry.wsdl”|diff yields the following:

diff x y compares x to y
- indicates a line present in x but missing in y
+ indicates a line present in y but missing in x
! indicates a line that exists in both x and y, but contains different information

   
    
      
             
           

Notice that the change is for the documentation, so it is relatively harmless.

What this means is that you can now easily use your existing Splunk installation to monitor changes for your SOA configuration. The entire activity for monitoring SOA is part of SOA Governance. Keep in mind that just because you are monitoring file system changes does not in itself constitute an upward step in the SOA maturity model. However, with Splunk’s ability to monitor changes, instantly produce differences for what has changed, ability to provide active alerts for the changes, and ability to produce reports for analysis, you have a valuable tool in your arsenal for achieving a step closer towards SOA maturity. To achieve a more sophisticated and powerful approach towards monitoring your SOA configuration, consider also using the Splunk for Change Management Application, as it provides predefined reports and dashboards to facilitate change auditing, change detection, change reporting, change validation and incident response based on change events, change tickets and configuration files. I hope this article and example will get you closer for examining using Splunk for monitoring SOA configuration files.

Everybody Splunk with the Splunk SDK

One of our partners in Asia came up with the interesting catch phrase “Everybody Splunk”, which we say internally. Today’s topic is about everybody using Splunk’s SDKs. As I’ve spoken to Splunk users, I’ve noticed that many of them are not aware of their existence. This topic has been discussed elsewhere in the development guide, but I’ll summarize. Splunk has SDK API to perform Search outside of using Splunk Web and the CLI that is available for

  • Java
  • Python
  • C#
  • PHP

If that doesn’t cover your favorite language, then, use the REST API which is the foundation for the SDKs. With the REST API, you can use any language you want that supports URI communication to search an index. The approach in each SDK is essentially the same. First authenticate, create the search string, iterate over the results, and then close the job. It’s that simple.

This brings me to the heart of today’s topic: Doing a Search in an application. Often developers are asked to look at time series data files (e.g. log files or application generated events) via an application. They may end up using libraries that help read, parse, and search files. Even if the code is simple, files that are only a few MBs in size may grow to be GBs in weeks. The point is that any search will be sequential and probably slow. If the data were indexed within Splunk as in just point Splunk to it, then a SDK could be used to perform the search. Because it is indexed, search time will have high performance characteristics and Splunk’s search capabilities and language will provide a rich interface to manufacture the search. In this manner, Splunk becomes part of the application, where search is an integral part of the development and production results. In a future blog, I’ll go over an example for using one of the SDKs.

Now, I just can’t resist adding some verse to Everybody Splunk. Don’t worry; I won’t quit my day job.

Everybody Splunk.
Superstars Dunk.
Everyone say hey.
Find the needle in the hay.
Let Splunk show you the way.

Splunk as a SOA Consumer

When you think about Service Oriented Architectures (SOA), Splunk doesn’t come to mind first. However, it is important to realize that any entity that is able to consume or produce services is by definition a participant in a SOA. With that said, let me state that Splunk can easily capture and index the output of a web service later used for search.

The next question is what are the use cases. Information that can be captured in a time series manner is ideal for Splunk. For example, suppose a warehouse is using a RFID reader to capture the movement of goods in and out of its facilities. This information usually drives a software business practice, which in turn may have web services to query the current state of what is happening. With Splunk, you could use a scripted input to capture the output of a web service. The script would call a web services client written in any of the usual web service friendly languages and information such as the inventory of purchased goods would be captured in a time series manner every N seconds. After these snapshots in time of purchased goods are indexed, you can then run time delimited searches and trend reports from Splunk Web to provide instance analysis.

This example is just a limited introduction for what can occur with Splunk and SOA. As more and more people deploy SOA, services are available to capture metrics within a corporation. Splunk could be used as a quick and powerful mechanism to capture time series metrics to provide search capable insight into information flow derived via a service.  To show that this is a real possibility I’ve created on Splunkbase a weather example and a stock quotes example using public web services advertised from Xmethods. Weather output looks like this:

<?xml version=”1.0″ encoding=”utf-16″?>
<CurrentWeather>
<Location>Nice, France (LFMN) 43-39N 007-12E 10M</Location>
<Time>Dec 09, 2008 - 12:00 PM EST / 2008.12.09 1700 UTC</Time>
<Wind> from the NNW (330 degrees) at 10 MPH (9 KT):0</Wind>
<Visibility> greater than 7 mile(s):0</Visibility>
<SkyConditions> mostly cloudy</SkyConditions>
<Temperature> 46 F (8 C)</Temperature>
<DewPoint> 24 F (-4 C)</DewPoint>
<RelativeHumidity> 42%</RelativeHumidity>
<Pressure> 30.00 in. Hg (1016 hPa)</Pressure>
<Status>Success</Status>
</CurrentWeather>

The Stock quote output is similar in style. Because the output is in XML format and a timestamp is already in the data, this was very easy to capture in Splunk. Feel free to try either example with your own Splunk installation using your own cities for weather and your own stock symbols. Hopefully, it will inspire you to create your own applications using SOA output for indexing into Splunk.

One last note is once XML is indexed within Splunk, any search can be piped to the Splunk xmlkv command. This will automatically create field extractions for you for all the elements in the XML stanza. These field extractions can next be used for your Splunk Reports.

Next Page »