Splunking from Python Part I
| Topics: | Homepage |
|---|---|
| Tags: | |
| Share: |
One of the neat things about splunk is that it’s search interface is a SOAP call. In this post I’m going to talk about using the python modules that ship with splunk to talk to splunk over this SOAP interface.
First off you will need to set some environment variables so that you are running the version of python that ships with splunk :
export SPLUNK_HOME=<WHERE_YOU_INSTALLED_SPLUNK>
export PATH=$SPLUNK_HOME/bin:$PATH
export LD_LIBRARY_PATH=$SPLUNK_HOME/lib:$LD_LIBRARY_PATH
Ok so now you should be good to go so fire up python. Your python version should be 2.4.2. If it’s not do a “which python” from the command prompt to make sure you are using the python that shipped with splunk.
We need to do some setup before any searches can be run :
Python 2.4.2 (#1, Mar 11 2009, 21:45:07)
[GCC 4.0.2] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import splunk.search.splunkTest #initialize the python internals without using twistd
>>> import splunk.search.SearchCore as SearchCore #This is the module we are going to use to issue searches
If you want to run against a remote splunk server or on different ports you can run the following :
>>> SearchCore.SearchService.gSearchService._searchEngineURL = “http://<remote_host>:<searchengine_port>”
The method on the SearchCore module that executes the queries is called runQuery and it takes two arguments.
def runQuery(queryString, userStr )
The userStr can be any string for now; in future releases it will probably be an auth token. It is the user that your searches will appear under in the searchhistory domain.
The queryString is where the magic happens
.
Basically a query string contains three major elements.
QUERY : Terms following this are as you would see in the splunk web ui search box. This pulls the resulting ids into an id space internally in the query.
GET : Terms following this instruct splunk on what extract from ids in the id space into results the result space.
OUTPUT : How to format the results from the result space to output.
For a more detailed reference on the query syntax check out : http://www.splunk.com/index.php/docs?doc=developer.html&vers=#58
Now for our first search :
The meta::all key is a splunk key that every event in the system will have.
>>> SearchCore.runQuery(”QUERY meta::all”,”brian”)
You will get the result “<queryResult></queryResult>” from this as we have not specified an OUTPUT element. Note that unless you specify a domain to run these queries in they will run in the default index ( main ).
Run :
>>> SearchCore.runQuery(”QUERY meta::all OUTPUT splunkui::1.0″,”brian”) # We use the splunkui output here because we want to do things that the ui does like get events …
Now the result is :
<queryResult><eventIndexedCount>58728</eventIndexedCount>
<ids>
</ids>
<projectedResultCount>1001</projectedResultCount>
<clampedStartTime>1049204073</clampedStartTime>
<clampedEndTime>1142300808</clampedEndTime>
</queryResult>
Of course your numbers will be different.
The projected result count element is legacy and can be safely ignored.
The eventIndexedCount is the total number of events in this domain.
The clampedStartTime/clampedEndTime constrain the timerange in which results for this query may appear.
Note there is still no event output … lets fix that :
>>> SearchCore.runQuery(”QUERY meta::all GET events::0-2 OUTPUT splunkui::1.0 format::raw”, “brian”) #The format::raw tells the outputter to ignore all segment information
Results :
<queryResult><eventIndexedCount>19704</eventIndexedCount>
<ids>
</ids>
<projectedResultCount>1001</projectedResultCount>
<clampedStartTime>1041618608</clampedStartTime>
<clampedEndTime>1142302043</clampedEndTime>
<results type=”events”>
<result cd=”0:1532081″>
<segtext xml:space=”preserve”>Oct 14 16:29:38 liftoff sendmail[20336]: i9ENTcHf020336: from=<erik@transaction-engines.com>, size=667, class=0, nrcpts=1, msgid=<416F0BE2.3060306@transaction-engines.com>, proto=ESMTP, daemon=MTA, relay=h-68-167-140-171.snvacaid.covad.net [68.167.140.171]</segtext>
<timestamp>1141691378</timestamp>
<source cd=”1″ string=”/opt/splunk/var/spool/splunk/maillog”>
<dir>/opt/splunk/var/spool/splunk/</dir>
<file>maillog</file>
</source>
<host cd=”1″>localhost</host>
<sourcetype cd=”1″ base=”sendmail_syslog”>sendmail_syslog</sourcetype>
<type cd=”38″ wob=” v:de22 t:97 t:49 t:17882122 t:2336388840 t:63489930 t:4036439400 t:0 “>
<tags><tag>transaction</tag><tag>class</tag><tag>sendmail</tag><tag>com</tag><tag>size</tag><tag>net</tag></tags>
</type>
</result>
<result cd=”0:2223455″>
<segtext xml:space=”preserve”>Oct 18 15:14:27 liftoff sendmail[2527]: i9IMERup002527: from=<erik@transaction-engines.com>, size=3690, class=0, nrcpts=1, msgid=<41744043.3060306@transaction-engines.com>, proto=ESMTP, daemon=MTA, relay=h-68-167-140-171.snvacaid.covad.net [68.167.140.171]</segtext>
<timestamp>1141686867</timestamp>
<source cd=”1″ string=”/opt/splunk/var/spool/splunk/maillog”>
<dir>/opt/splunk/var/spool/splunk/</dir>
<file>maillog</file>
</source>
<host cd=”1″>localhost</host>
<sourcetype cd=”1″ base=”sendmail_syslog”>sendmail_syslog</sourcetype>
<type cd=”38″ wob=” v:de22 t:97 t:49 t:17882122 t:2336388840 t:63489930 t:4036439400 t:0 “>
<tags><tag>transaction</tag><tag>class</tag><tag>sendmail</tag><tag>com</tag><tag>size</tag><tag>net</tag></tags>
</type>
</result>
<result cd=”0:3155870″>
<segtext xml:space=”preserve”>Oct 21 14:03:53 liftoff sendmail[11725]: i9LL3quJ011725: from=<erik@transaction-engines.com>, size=2663, class=0, nrcpts=1, msgid=<41782438.7060303@transaction-engines.com>, proto=ESMTP, daemon=MTA, relay=h-68-167-140-171.snvacaid.covad.net [68.167.140.171]</segtext>
<timestamp>1141423433</timestamp>
<source cd=”1″ string=”/opt/splunk/var/spool/splunk/maillog”>
<dir>/opt/splunk/var/spool/splunk/</dir>
<file>maillog</file>
</source>
<host cd=”1″>localhost</host>
<sourcetype cd=”1″ base=”sendmail_syslog”>sendmail_syslog</sourcetype>
<type cd=”38″ wob=” v:de22 t:97 t:49 t:17882122 t:2336388840 t:63489930 t:4036439400 t:0 “>
<tags><tag>transaction</tag><tag>class</tag><tag>sendmail</tag><tag>com</tag><tag>size</tag><tag>net</tag></tags>
</type>
</result>
</results>
</queryResult>
Now you can see the actual event text in the segtext element in the results.
If you want to get counts like you see in the tab headings in the splunkui you can use OUTPUT term scheduler::1.0.
This will give you the following output :
<queryResult>
<schedResults>
<eventCount>10000+</eventCount>
<hostCount>1+</hostCount>
<sourceCount>1+</sourceCount>
<typeCount>239+</typeCount>
<sourceTypeCount>1+</sourceTypeCount>
<eventtagCount>62+</eventtagCount>
<starttime>12/31/1969:16:00:00</starttime>
<endtime>03/13/2006:18:50:48</endtime>
</schedResults>
</queryResult>
Note the + marks that are the equivalent of the > signs in the ui that tell you that there may be more than what is displayed.
You may mix the splunkui and scheduler outputs in a single querystring.
Tune in next time where I’ll explain some of the more advanced elements of the search language.
Brian

May 28th, 2008 at 12:56 am
[...] engineer Brian Murphy wrote up instructions for Splunking from Python. Just when I thought I was going to spend all week doing a worse job at [...]