This entry was written by and posted on at and filed under Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
Both comments and trackbacks are currently closed.
Interesting. I am not a splunk customer but I do use a lot of virtualization, mostly in the VMWare world.
I have been wanting to explore the APIs more closely. I would be interested in learning more about this VM for splunk apps.
Rod
March 31, 2008
This is fantastic…I was just looking around for a way to do this!
Matthew
April 17, 2008
I too work with virtualisation and mostly VMWare and would be interested in seeing what splunk could do with the data that could be extracted from the api/sdk , not got much experience programming so the api quite often feels a bit alien to me but I do have real worl experience of the kinds of things that people are interested in seeing reports or alerts on within their virtual environment and at the virtualisation layer.
Donal
April 23, 2008
We have been using splunk on a 50gb vmware machine, and it has run flawlessly, well at least until we ran out of disk space. We are now in the process of upgrading to a larger machine.
Albert Lavigne
April 25, 2008
I’m currently piloting a Splunk install on a virtual W2k3 server for a client .
I am also considering implementing this in my own office, where there are numerous vmware hosts and guests running at a given time, if I can work around the issue of the splunk install having slammed the door shut on my remote access (web and ssh both) to the linux machine I initially installed it on.
Tammi
April 30, 2008
we have several ESX servers and many VMs, I would love to see what the splunk VMware app can do for us.
since I’m not our org’s VMware administrator, I’m not too sure what information can be extracted. I’m thinking performance information (cpu/disk/network/mem utilization), changes to the VM configuration (for change mgmt)…perhaps stuff related to VMotion? if you detail the type of information you can exact, we should be able provide better input.
so does the REST search interface support asynchronous dispatches yet?
the main reason i ask is that increasingly we are performing very large searches to feed into reports. my team have limited access to the production system which means passing our searches via email to the operations group to dispatch via the command line. i have been tinkering with the SDKs and REST inteface to try and put together something that will allow us to dispatch these searches from the web.
nick
July 17, 2008
Yes, the REST API is using dispatch under the hood so you can write a client application and get the benefit of long-running searches and job control. I’ll post some more details in the forum.
andrea
July 18, 2008
just as well you have “all you can eat” plans. in the land downunder we live in a time warp and have to pay for every friggin’ MB.
nick
July 29, 2008
bah, posted to early…
i was about to say most excellent.
I can see it now, while kicking back with a beer at the local pub a manager calls screaming that the sky is falling and the world as we know it world is about to come to an untimely end. I whip out iSplunk and within seconds I can proclaim “it ain’t our platform baby”.
Of course in the real world it is always our platform and the world is already on a knife’s edge…
nick
July 29, 2008
Don’t get me started about telcos. I was in that industry a long time…
I’m starting simple with iPhone, something like what I did with the Dashboard widget. The new stuff in Splunk makes it easier because lots of people have already figured out how to parse Atom RSS feeds and I just have to read what I get back from the endpoint. For me the hard part is Cocoa, as I’m a big unix-head. And for iPhone there isn’t much code out there yet to learn from. I also want to tie it in to the push notification service so in a couple months when that is available I can have something on the Splunk side tell you actual useful information when you need it, without having to open the application and go looking for it.
Designing for mobile devices gives you a lot more constraints, limited resources means you have to really think about what is most important and present it to the user when she needs it, and only when she needs it. iPhone has a huge display by mobile standards, but compared to a normal desktop it’s still pathetically small. And building a usable UI without mouse and keyboard is tough.
I’m not the first to think that handheld system administration from a mobile device would be a good idea. There was even a paper about it. The Palm devices at the time could barely handle the complex UI required for a robust tool but expect to see many more sysadmin tools for iPhone. Even with the license restrictions that are annoying not just Open Source developers, it is still the most accessible mobile platform out there. (I’ll believe Android when I see a production device. And I’ll probably be writing code for that too.)
andrea
July 30, 2008
Thanks to Steven,
Known bug with IBM jvm.
Looks like the apache axis jar does not like the IBM jvm.
If your not using Sun’s jvm you might wait for me to post an update.
Working on it in the background
bash-3.00# echo $SPLUNK_HOME
/opt/splunk
bash-3.00# splunk -version
Splunk 3.3 (build 38914)
bash-3.00# echo $JAVAHOME
/usr/java5_64/
bash-3.00# which java
/usr/java5/jre/bin/java
bash-3.00# java -version
java version “1.5.0″
Java(TM) 2 Runtime Environment, Standard Edition (build pap32devifx-20070725 (SR5a))
IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 AIX ppc-32 j9vmap3223-20070426 (JIT enabled)
J9VM – 20070420_12448_bHdSMR
JIT – 20070419_1806_r8
GC – 200704_19)
JCL – 20070725
bash-3.00# pwd
/opt/splunk/etc/apps/vmware
bash-3.00# java -jar lib/splunk.jar
[ Wed Aug 13 08:51:00 ADT 2008 ] Begin Log.
Started
Exception in thread “main” java.lang.NoClassDefFoundError: sun.security.provider.Sun
at java.lang.J9VMInternals.verifyImpl(Native Method)
at java.lang.J9VMInternals.verify(J9VMInternals.java:66)
at java.lang.J9VMInternals.verify(J9VMInternals.java:64)
at java.lang.J9VMInternals.initialize(J9VMInternals.java:127)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:67)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:521)
at org.apache.commons.discovery.tools.ClassUtils.newInstance(ClassUtils.java:160)
at org.apache.axis.AxisProperties$1.run(AxisProperties.java:183)
at java.security.AccessController.doPrivileged(AccessController.java:193)
at org.apache.axis.AxisProperties.newInstance(AxisProperties.java:166)
at org.apache.axis.components.net.SocketFactoryFactory.getFactory(SocketFactoryFactory.java:75)
at org.apache.axis.transport.http.HTTPSender.getSocket(HTTPSender.java:187)
at org.apache.axis.transport.http.HTTPSender.writeToSocket(HTTPSender.java:404)
at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:138)
at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165)
at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
at org.apache.axis.client.Call.invoke(Call.java:2767)
at org.apache.axis.client.Call.invoke(Call.java:2443)
at org.apache.axis.client.Call.invoke(Call.java:2366)
at org.apache.axis.client.Call.invoke(Call.java:1812)
at com.vmware.vim.VimBindingStub.retrieveServiceContent(VimBindingStub.java:23449)
at com.vmware.apputils.vim.ServiceConnection.connect(ServiceConnection.java:54)
at com.vmware.apputils.vim.ServiceUtil.clientConnect(ServiceUtil.java:36)
at com.vmware.apputils.AppUtil.connect(AppUtil.java:389)
at com.splunk.VMWareHostConnection.init(Splunk4VMI.java:198)
at com.splunk.Splunk4VMI.init(Splunk4VMI.java:309)
at com.splunk.Splunk4VMI.main(Splunk4VMI.java:456)
erik
August 13, 2008
You can move indexes around pretty easily now, but it’s an operation that can only be done with splunkd not running. So while a script to do it would be possible, any Splunk application (in the sense of something installed in splunk/etc/apps) would need splunkd to execute it.
andrea
August 28, 2008
It would be nice if you could write a script to “package” an existing index as an “app” which you can then easily move about.
Marinus van Aswegen
August 28, 2008
This is not a Splunk product, it’s me playing with iPhone at home and I have not had any time to work on it recently. There are also some technical issues in integrating with the push notification service that makes publishing an iPhone client problematic (it must be uniquely tied to a particular back-end server.)
andrea
November 2, 2008
I have an input monitoring /var/log/…, with a whitelist and blacklist that are picking up the correct files.
It’s a central log server, so I’m trying to use transforms/props.conf to get it to set the hosts, but it won’t work.
I’ve tried using sourcetype in props.conf and settting sourcetype in the inputs.conf, doesn’t work.
I’ve also tried [source::/var/log/...] in props.conf, hoping it’d pick up the many different sources in /var/log, but that doesn’t work either. The default host seems to override the extraction.
Another possibility is the complete absence of a .data file.
Each bucket should have a Hosts.data, Sources.data and SourceTypes.data.
for dir in $(find /opt/splunk/var/lib/splunk -name “db_*” -type d); do
echo -n $dir; ls $dir/*.data | wc -l;
done
You should see 3 for each line.
Joshua Rodman
February 9, 2009
FmyI:
for dir in $(find /opt/splunk/var/lib/splunk -name “db_*” -type d); do echo -n $dir; ls $dir/*.data | wc -l; done
Gerad
June 4, 2009
Hello,
I downloaded and installed the free splunk, I would like to use it for centralized reporting and monitoring. I downloaded the blue coat app but have no idea how to install…help please.
Thank you
Rob Keller
June 28, 2009
I’ve had success using the following command to rebuilt the .data files from the index information:
recover-metadata
I’m not sure of all of the side-effects of this, but here are two considerations: (1) This will compleetly relplace your existing .data files (assuming the index itself isn’t corrupt; in which case you can end up with less data that you had in the first place). So you should make a backup of your .data file first. And (2), all of your hosts, sources, and sourcetypes will be in all lowercase. I wrote a small python script to guess what the original case was based on other .data files in the top-level db directory. You can see/get a copy here: http://pastebin.ca/1481049
Lowell Alleman
July 1, 2009
There’s no comma there after clientip, and each value on one line. I haven’t specifically tested using it to include things (basically as a whitelist) but offhand I don’t see why it wouldn’t work. It’s a valid subsearch and you can NOT or not, as you prefer.
n.b. I haven’t tried this on 4.0, although I have no reason to expect it wouldn’t still work.
andrea
July 9, 2009
If you downloaded the application from the Splunk UI, there is a green “Install App” button.
andrea
July 28, 2009
When is the free version expected to come out? i really liked version and I just did a reinstall, but it is no longer free.
Max
October 10, 2009
Erik, you might check out Nigel Kersten and his weblog/site for more info on Puppet. I believe he is one of the Puppet Masters. http://explanatorygap.net/category/puppet/
Funny, i ( tried ) to read Schumpeter Capitalism, Socialism and Democracy over the summer. Struggled a bit but could see the relevance for the modern entrepreneur. I should have read it more carefully since i missed the Creative Destruction point. Thanks for the tip, will try again.
erik
October 26, 2009
There’s something else Splunk could get from Puppet: ideas for configuration syntax.
Splunk 4 overextends INI syntax by putting extra syntax inside stanza names like [host::nyc*], inside key names (MORE_THAN_80 or REPORT-) and inside key values (LOOKUP-foo = mytable userid AS myuserid OUTPUT username AS myusername)
Two years ago 2007 Splunk did some soul-searching on layered configuration.
The problem is that configuration is domain-specific, and “using XML” really means “building a DSL around XML syntax”. Splunk just chose to abuse INI syntax for its DSL instead of XML syntax.
I think that Splunk looks constrained by INI syntax and could do more with a Puppet-like resource syntax. See http://reductivelabs.com/trac/puppet/wiki/DocumentationStart. The language reference says “Resources are fundamentally built from a type, a title, and a list of attributes”
It would be declarative, human-read/writeable domain-specific language, two-way convertible with the more machine-readable YAML (or XML) representations. Might feel like:
Although, I would add that if everything Splunk needs can be expressed in hand-written extended-syntax INI files there’s no point in expending resources on a more general Puppet-like resource syntax (which may not event be a good fit for the Splunk domain). Another case of if it ain’t broken don’t fix it.
And it takes much the same amount of time to get used to an INI-based DSL as for any other (such as Apache, Puppet, or XML-based one).
Graham Poulter
October 27, 2009
Or not. The puppet syntax describes resource and instantiates them, but each named resource of a given type must occur once only. Because the resource declarations are unordered, one cannot make a props{ “foo”: …} in two places with one overriding the other. This is unlike INI syntax where a stanza in one file can be made to override the same-titled stanza in another file.
May as well delete the earlier comments, the Puppet resource syntax won’t do for cascading configuration files.
Graham Poulter
October 27, 2009
It would have been nice to have a link to the download!
Bill Clinton
October 28, 2009
Hey Erik,
In your scaling model are you forwarding the same source data to all indexers or are you splitting the sources up and sending 1/N of the sources to each indexer (where N is your number of indexers)?
Dale
October 28, 2009
Is there a limitation on the size of log files that splunk free can process?
nick fox
October 29, 2009
Hello
This is great! , we did purchase a 5gb enterprise license for our critical systems but this free version is a good option for our test lab. Does it comes with a real free beer coupon ;p ?
Enjoy!
Fernando Cabal
October 29, 2009
So, just to understand what “free” means and what it does…
The only difference that I see between 4.0’s [disabled] features and that of 2.x-3.x is that the alerting is disabled in 4.0 as opposed to the earlier versions where the alerting functionality was included. Do I have this right?
This was a somewhat useful piece, and I’m curious as to whether the specific decision to leave this out was made for any particular reason?
And, yes, I understand the verbiage about this being for “personal” use. The problem is that for small environments where we’re expected to try to find FOSS solutions for as many infrastructure applications as possible, there’s a big step there to the least-expensive license.
(Not to sound ungrateful, of course – just want to be clear on this – we do love the product…)
K. M. Peterson
October 29, 2009
Few replies in one…. ( Fernando, Nick and K.M. )
Fernando, you can always get a free beer here at Splunk. If you are ever in SF, come by, no coupon needed! We love it when users stop by for a drink.
Nick, there is no limitation on the size of files you can eat for any given day. Go ahead and index a terabyte. The limitations are that you cannot do that for 3 days in a rolling 30 day period – ya, a bit complicated. The free product is *designed* for ad-hoc use with large volumes or for lower volumes of continuous input. Feel free to bug me if that does not make sense.
K.M., it was a long process trying to decide if we can/should pull alerting out of free. We have the tricky balance of trying to provide a great free product but also make some money along the way. It was not an easy or clean cut decision and we knew that people would be bummed. I can hint that you can work around it yourself using cron or equivalent, its not as easy but then it shouldn’t be too hard either. We are committed to providing a great and useful free product and finding the right balance of features will be an ongoing challenge. Perhaps most important is your feedback – do let us know what you want in, what features we are missing all together, what does not make sense, etc. Its really hard to build a great product without lots of input
erik
October 29, 2009
Hi Dale, sorry for the late reply.
I recommend auto load balancing using splunk forwarders. This model will split the data from (n) forwaders evenly across (m) indexers. Splitting the data evenly has huge search performance benefits. In general there are lots of advantages to using splunk forwarders at the origin and using auto LB.
If you are using syslog its a bit tricker and in that model i might still use a forwarder ( or two ) to listen for syslog but then have that fowrader auto LB across the indexers.
Not sure that helps or if i even answer the right question If not, let me know and i’ll try again.
erik
October 29, 2009
Hi Andrea,
Do you know if development of the iPhone app has progressed, or is it on hold due to the release of Splunk 4? This is something we could really use as a small company. There are only a handful of technical people on staff. As a result our on call schedule rotates through several people who are not devs. Having an iphone app where they can quickly look at problems would be very beneficial so they can quickly see if something is a temporary hiccup that has recovered or a real system failure.
Todd Nine
November 1, 2009
Thanks for the thoughts Justin.
Open sourcing “Splunk” becomes complicated when you start to pull on that string – we wrestled with that decision for years when starting out. I’m not sure if I would do things differently if had to do it all over again, but the pro/con lists make it a very close call.
We are starting now to publish some of our code to the open source. I don’t know if we ever get to open sourcing the engine, indexer, forwarder, … code. We have often talked about how cool it would be to provide a database to the community – we were huge fan’s of sleepycat/berkely db and often thought that would be a good model. Targeting developers by providing a Splunk engine is a fantastic idea, i’m just not sure what the model looks like – but i know there are lots of folks out there who would use it.
I’m working on 2010 planning and will add “stuff” to look at how to get our engine out to developers better – I’m not sure if open source is the answer, but will look at all options.
Thank you very much for bringing it up, we need your input to keep us heading the right direction.
Regards,
e
erik
November 2, 2009
Eric, great to hear! I’ve long been a fan of Splunk and often recommend it to our clients. Splunk is the only sane way to do large scale log analysis in my opinion.
A few years ago Splunk had promised an open source version. I fully understand the reasons why Splunk decided against this. However, now that you have achieved profitability and version 4.0 has been out for a few months, would Splunk consider honoring their original promise by releasing their older 2.0 or 3.0 versions under an open source license such as GPL or AGPL? Even if you held back the web interface (even though I’d love to see that opensourced as well) and only released the indexing engine, the CLI, forewarding/recieving, and API, I’m sure there are many FOSS projects and Linux distributions that would be excited to re-use your code.
I personally work on various open source projects and have longed to use spunk in them. In particular we about to start a complete rewrite of the BASE project (base.secureideas.net) and would love to replace our backend database (currently MySQL) with an opensource version of Splunk. I am also a lead developer of the Samurai-WTF Live CD project (samurai.inguardians.com) and would love to include an open source version of Splunk to collect the output of the various pentest tools. The distribution and modification restrictions of Splunk Free are simply show stoppers for these purposes.
I understand that Splunk may decide this isn’t possible, but since you run you company in such an open-to-the-community manner, I thought it was worth the effort to ask. Regardless of your decision, I will continue to be a huge fan and will always remain appreciative of your Free versions.
Justin Searle
November 2, 2009
Hi Erik –
I arrived here google searching about how to get my saved scheduled searches migrated to 4.0. I’m a bit disappointed about the lack of search-notifications for free licenses upgraded from 3.0 to 4.0x, and wish more was mentioned in this regards in the migration considerations sections of the manual.
Perhaps you could consider in a future release retaining search notifications for users who are performing an upgrade install for 3.x, but not marketing it to any new install free customers?
Paul
November 4, 2009
Hi Paul,
It was a hard decision to remove the notification and one not taken lightly. I hope that it was obvious that the feature was not longer available, we tried hard to explain this to folks before upgrade. I’m not sure if that feature will come back to the free product at some point. In the mean time, the only recommendation i have is to use some other scheduler to run the search for you. I know this is extra work but we needed to have some difference between the free and pay for version or we would not be able to keep all the engineers working on the free product.
I hope you understand and if you have other ideas please let me know.
Regards,
e
erik
November 4, 2009
Hi,
I’m thinking of giving this a however we are running vsphere and esx 4.0. Has any work been done to confirm what does and doesn’t work?
Cheers
Ste jones
November 4, 2009
Hey Ste,
I’ll be posting a new version of the VMWare app in a week or so – i’d wait.
I’lll post back here when its posted – it wont be long.
Thanks for the comment!!
e
erik
November 4, 2009
hi there andrea:
i found this program read some about it & installed wanting to know what makes my machine tick, can i after scanning begin making my machine a performance hotrod?
what type of apps. are you working on now, or will be in the future? Can i search the entire internet web for information without being tied into a network?
so Id like to write an app also..
jimoer i ke
jimoer`i`ke`
November 5, 2009
recover-metadata is the right solution, when:
1 – you know where the problem is (which bucket)
2 – the problem is in a real index of your data
Finding the problem is what the above posts are about, of course.
Recommendations for recover-metadata:
- you may want to copy your pre-existing .data files somewhere, in case of sadness.
- If you have a bucket that recover-metadata doesn’t work on, please consider whether you can provide it to us so we can : 1 – fix the bucket, 2 – fix recover-metadata
CAVEAT: recover-metadata likely to work poorly on the _internal index, because of its design. strings like source::foo in the event text are likely to fool it, and we have a certain amount of that in _internal. Luckily, you probably don’t care about _internal too much, as it’s only splunk logs.
Joshua Rodman
January 13, 2010
err.. only splunkd-generated logfiles “splunkd is happy today, splunkd is sad today” etc.
Joshua Rodman
January 13, 2010
Are you folks going to publish any samples of how to query the search app using REST?
There seems to be zero documentation for Splunk 4.0+ for REST endpoints…
David Montgomery
February 6, 2010
David, I’m told the REST documentation is still underway.. Let me see if I can find some examples to get you going… Are you only looking for the Search API docs?
Thanks Nimish. Very helpful. New to Splunk (just downloaded and setup the free version). Can you post a screenshot of the lookup in action on the splunk page (e.g., does it show up on the Search application page as extra fields)?
T
John
February 10, 2010
John,
Welcome to using Splunk. After running the lookup search command, it generates a new field called country in the example above. You can see the new field from the field picker menu on the left.
Interessting, have u tested using SSD’s in raid ? Like 4xIntel SSD’s ?
teb
February 11, 2010
This is potentially huge. Splunk begs for a fully mature query language for advanced users. The question is, is this expected to become a core part of the product or is it an experiment?
Jerry
February 15, 2010
Jerry,
Thanks for your reply.
In my opinion Splunk’s query language, while difficult to learn,*is* extremely powerful and robust enough for advanced users. Is there more detail on this that you could provide? SplunkMSE’s intention is not so much to replace the existing query language as it is to allow SQL/ODBC-based tools to integrate with the Splunk data store. Think of it as an alternative API.
As far as becoming a core part of the product, the intent is to fully support this as an open source project which means as we let more people know about it, the number of features (and bugs/fixes) will go up based on feedback.
I encourage you to give it a try and let me know what you think.
Mark, I had a few comments on your blog post. First off, what company are you talking about? Tripwire? I am assuming so because of the “Log Center” comment.
Secondly, you are being a little liberal when you talk about the schema part. You are absolutely right, the user will have to know the schema of the data coming in in order to generate meaningful reports. That is the case with every single solution out there. Some have great built-in support for data sources, with ArcSight definitely leading the pack on that front. Splunk is not at all different in that regard. Splunk cannot do any magic with regards to generating reports from unknown data sources (syntaxes). You have to define your own field extractions in order to generate reports. Splunk does not support many data sources out of the box. [Except if the data source is key-value. In that case Splunk does auto-extraction].
My last point is more of a question to you. Do you really consider Splunk to be a SIEM? And if so, why?
Cheers, glad to see you hitting the ground running at Splunk!
Thanks! Unfortunately I haven’t had much time to continue with it. But, lucky you, you do! Welcome to Splunk!
Andrea
February 17, 2010
Hey Mark,
Maybe you are “being a little liberal when you talk about the schema part” but you’re spot on when you say, “the entire SIEM market needs to change.” We agree. So if you want a demo of Tripwire Log Center I’ll be at RSA and BSides, so swing by the Tripwire RSA booth and I can walk you through the product.
What I’ve found so far is that Splunk and Tripwire solve fundamentally different problems – the result is that out of all the traditional SIEM vendors we’ve replaced, and out of all the competitive cycles we’ve been in, Splunk hasn’t come up. And when it does, one of us is in the wrong place, rather than a head-to-head battle royale ensuing.
And to that end, I’m most curious for you to respond to Raffael’s questions, “Do you really consider Splunk to be a SIEM? And if so, why?”
Thanks, and hopefully I’ll see you at RSA,
Tim | Product Marketing, Tripwire Log Center
1) you do not need to do Step 2. You need to make no changes to authorize.conf.
2) do not name your search command the name of an existing command. In 4.0 there is already a “shape” command, so in the above example, change the name from “shape” to something else.
Yes – It was Tripwire’s announcement I was referring to. There are some solutions that require editing of a universal parser schema to identify log data coming from a particular data source. Exposing this universal parser to the uninitiated and allowing a user to edit it gives then the opportunity to break support for devices already dealt with by the parser.
With Splunk I don’t have to edit a schema to create a graphic indicating a timeline of actions taken by a user. I just have to create a few field extractions. A few field extractions don’t put a whole universal parser at risk. The universal parser is what I’d call a ‘brittle’ piece of the SIEM product.
I don’t look at Splunk as a SIEM for the following reasons:
1. SIEMs are focused on event management over the last few minutes, hours, or a day at most. Splunk can scale to create real metrics across days, weeks, months or longer.
2. SIEMs only provide a filtered view of the world. They give you the funnel analogy and say send only log data to the SIEM that you feel is security relevant. This pushes operational use cases into their own silo apart from Security issues. Splunk was purpose build to index any logs including multi-line custom application log data. No such scalability issues for Splunk.
3. SIEMs are built to work with a limited set of data sources. Many vendors have a supported products list that follows a rigid reactive support model. If the SIEM system supports DB2 9.0 and 9.7 comes out, Splunk doesn’t care. It simply makes any new fields available and the user can change a saved search on the fly to take advantage of the new field. That can’t always be said of SIEMs.
The more I look at Splunk, I begin to think of it more as part of an overall Pattern-Based Strategy for business that can accept many distinct business data sources that have security relevance.
From Gartner: Pattern-Based Strategy as It Applies to I&O, by Bill Malik, November 2009
“For internal pattern detection, the infrastructure and operations (I&O) team uses tools such as log analysis, performance measurement, capacity planning, trend analysis, service desk incident reporting and security incident disambiguation; see “Pattern Discovery with Security Monitoring and Fraud Detection Technologies” for a discussion of these tools and techniques. I&O may use the following vendors for pattern seeking:
* Splunk analyzes log records to detect time-sequenced patterns as an aid to problem determination and security analysis.”
Hi Tina – is there a way to “Put” the date into a file name via the outputcsv command?
I’d like to have a scheduled search that outputs the results using outputcsv, but I’m afraid of overwriting the existing file. Ideally, I’d like to do something like this: ….. | outputcsv LoginData[date_Month][date_Day][date_Year]
Is something like this possible? Thanks in advance!
Tim Osborne
February 19, 2010
So two questions on this.
1. the “mycsvfile.csv” has the form of
clientip,
xxx.xxx.xxx.xxx,
yyy.yyy.yyy.yyy,
zzz.zzz.zzz.zzz,
or
clientip
xxx.xxx.xxx.xxx
yyy.yyy.yyy.yyy
zzz.zzz.zzz.zzz
or
clientip,xxx.xxx.xxx.xxx,yyy.yyy.yyy.yyy,zzz.zzz.zzz.zzz
2. Can this be use instead of an exclusion search a inclusion search
i.e. source=”/var/log/apache2/mynewdomain_access_log” [inputcsv mycsvfile.csv]
would that search work?
Eric S
February 22, 2010
We also love Quicksilver but I wasn’t aware of the plugin for Salesforce, so we are definitely going to give this a go. What is missing is a plugin for Micosoft Mail merge on the Mac…definitely something that is needed. If you ever come across this drop us a line.
Thanks for posting this post, really happy to find another tool to integrate with Salesforce.
You are clearly a product marketer. I love how you spun my question about Splunk being a SIEM and you pointed out some facets where Splunk can shine compared to a SIEM.
You, unfortunately, cast a very one-sided light. SIEMs have a bunch of advantages over Splunk: Real-time correlation, great out of the box device support (parsers), advanced reporting, pattern discovery (actual features that help you analyze patterns), real-time dashboards, advanced visualization, workflow and collaboration support, ticketing capabilities, asset modeling and correlation, features to tie events to identities or actors, and not to forget, the premise that SIEMs were built around: vulnerability correlation.
All of these capabilities are not supported out of the box by Splunk. Especially the device support is something that users of Splunk will have a super hard time getting up to par with a SIEM. Those companies have teams of 30 plus people that work on those capabilities in a full-time fashion. It’s not a simple problem!
I really don’t know anything about Sybase IQ, but I can tell you a bit more about Splunk. Splunk is a search engine for time-based data such as log events, performance information, configuration changes, etc. Think of it as real-time “Google for IT Data”. It can index a huge amount of data per day and provide extremely fast “need in a haystack” type of searches. In addition it provides robust analytics **at search time** on that data. My semi-educated guess is that Splunk and Sybase IQ are apples and oranges. We have never seen it come up as a potential competitor because the use cases are probably very different.
The SplunkMSE add-on is a bridge between SQL and the Splunk Search Engine. Splunk’s native query language is much more Google-like than structured SQL with the addition of many powerful transforms and statistical operators.
Just one correction to your text. I’m a Splunk user since 1.0
Mika
March 9, 2010
If hacking WoW accounts is a billion dollar underground business, then I think there is a need for better internal tracking, and for players to better understand how to protect their accounts.
[...] in nature is about who is calling who and who else did the initial recipient call. Splunk’s transaction search command can be used to group similar records to provide law enforcement this critical data to carry on [...]
RT @packetslave: Wow #vmware ESXi is a noisy syslog beast. 10,000 routine msgs/hour from 3 hosts! #Splunk turned that into 3 we care a ... #11 hours ago
Interesting. I am not a splunk customer but I do use a lot of virtualization, mostly in the VMWare world.
I have been wanting to explore the APIs more closely. I would be interested in learning more about this VM for splunk apps.
This is fantastic…I was just looking around for a way to do this!
I too work with virtualisation and mostly VMWare and would be interested in seeing what splunk could do with the data that could be extracted from the api/sdk , not got much experience programming so the api quite often feels a bit alien to me but I do have real worl experience of the kinds of things that people are interested in seeing reports or alerts on within their virtual environment and at the virtualisation layer.
We have been using splunk on a 50gb vmware machine, and it has run flawlessly, well at least until we ran out of disk space. We are now in the process of upgrading to a larger machine.
I’m currently piloting a Splunk install on a virtual W2k3 server for a client .
I am also considering implementing this in my own office, where there are numerous vmware hosts and guests running at a given time, if I can work around the issue of the splunk install having slammed the door shut on my remote access (web and ssh both) to the linux machine I initially installed it on.
we have several ESX servers and many VMs, I would love to see what the splunk VMware app can do for us.
since I’m not our org’s VMware administrator, I’m not too sure what information can be extracted. I’m thinking performance information (cpu/disk/network/mem utilization), changes to the VM configuration (for change mgmt)…perhaps stuff related to VMotion? if you detail the type of information you can exact, we should be able provide better input.
i posted a question in the development forums about dispatching via REST…
http://www.splunk.com/support/forum:SplunkDev/1951
so does the REST search interface support asynchronous dispatches yet?
the main reason i ask is that increasingly we are performing very large searches to feed into reports. my team have limited access to the production system which means passing our searches via email to the operations group to dispatch via the command line. i have been tinkering with the SDKs and REST inteface to try and put together something that will allow us to dispatch these searches from the web.
Yes, the REST API is using dispatch under the hood so you can write a client application and get the benefit of long-running searches and job control. I’ll post some more details in the forum.
just as well you have “all you can eat” plans. in the land downunder we live in a time warp and have to pay for every friggin’ MB.
bah, posted to early…
i was about to say most excellent.
I can see it now, while kicking back with a beer at the local pub a manager calls screaming that the sky is falling and the world as we know it world is about to come to an untimely end. I whip out iSplunk and within seconds I can proclaim “it ain’t our platform baby”.
Of course in the real world it is always our platform and the world is already on a knife’s edge…
Don’t get me started about telcos. I was in that industry a long time…
I’m starting simple with iPhone, something like what I did with the Dashboard widget. The new stuff in Splunk makes it easier because lots of people have already figured out how to parse Atom RSS feeds and I just have to read what I get back from the endpoint. For me the hard part is Cocoa, as I’m a big unix-head. And for iPhone there isn’t much code out there yet to learn from. I also want to tie it in to the push notification service so in a couple months when that is available I can have something on the Splunk side tell you actual useful information when you need it, without having to open the application and go looking for it.
Designing for mobile devices gives you a lot more constraints, limited resources means you have to really think about what is most important and present it to the user when she needs it, and only when she needs it. iPhone has a huge display by mobile standards, but compared to a normal desktop it’s still pathetically small. And building a usable UI without mouse and keyboard is tough.
I’m not the first to think that handheld system administration from a mobile device would be a good idea. There was even a paper about it. The Palm devices at the time could barely handle the complex UI required for a robust tool but expect to see many more sysadmin tools for iPhone. Even with the license restrictions that are annoying not just Open Source developers, it is still the most accessible mobile platform out there. (I’ll believe Android when I see a production device. And I’ll probably be writing code for that too.)
Thanks to Steven,
Known bug with IBM jvm.
Looks like the apache axis jar does not like the IBM jvm.
If your not using Sun’s jvm you might wait for me to post an update.
Working on it in the background
bash-3.00# echo $SPLUNK_HOME
/opt/splunk
bash-3.00# splunk -version
Splunk 3.3 (build 38914)
bash-3.00# echo $JAVAHOME
/usr/java5_64/
bash-3.00# which java
/usr/java5/jre/bin/java
bash-3.00# java -version
java version “1.5.0″
Java(TM) 2 Runtime Environment, Standard Edition (build pap32devifx-20070725 (SR5a))
IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 AIX ppc-32 j9vmap3223-20070426 (JIT enabled)
J9VM – 20070420_12448_bHdSMR
JIT – 20070419_1806_r8
GC – 200704_19)
JCL – 20070725
bash-3.00# pwd
/opt/splunk/etc/apps/vmware
bash-3.00# java -jar lib/splunk.jar
[ Wed Aug 13 08:51:00 ADT 2008 ] Begin Log.
Started
Exception in thread “main” java.lang.NoClassDefFoundError: sun.security.provider.Sun
at java.lang.J9VMInternals.verifyImpl(Native Method)
at java.lang.J9VMInternals.verify(J9VMInternals.java:66)
at java.lang.J9VMInternals.verify(J9VMInternals.java:64)
at java.lang.J9VMInternals.initialize(J9VMInternals.java:127)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:67)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:521)
at org.apache.commons.discovery.tools.ClassUtils.newInstance(ClassUtils.java:160)
at org.apache.axis.AxisProperties$1.run(AxisProperties.java:183)
at java.security.AccessController.doPrivileged(AccessController.java:193)
at org.apache.axis.AxisProperties.newInstance(AxisProperties.java:166)
at org.apache.axis.components.net.SocketFactoryFactory.getFactory(SocketFactoryFactory.java:75)
at org.apache.axis.transport.http.HTTPSender.getSocket(HTTPSender.java:187)
at org.apache.axis.transport.http.HTTPSender.writeToSocket(HTTPSender.java:404)
at org.apache.axis.transport.http.HTTPSender.invoke(HTTPSender.java:138)
at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)
at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118)
at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83)
at org.apache.axis.client.AxisClient.invoke(AxisClient.java:165)
at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
at org.apache.axis.client.Call.invoke(Call.java:2767)
at org.apache.axis.client.Call.invoke(Call.java:2443)
at org.apache.axis.client.Call.invoke(Call.java:2366)
at org.apache.axis.client.Call.invoke(Call.java:1812)
at com.vmware.vim.VimBindingStub.retrieveServiceContent(VimBindingStub.java:23449)
at com.vmware.apputils.vim.ServiceConnection.connect(ServiceConnection.java:54)
at com.vmware.apputils.vim.ServiceUtil.clientConnect(ServiceUtil.java:36)
at com.vmware.apputils.AppUtil.connect(AppUtil.java:389)
at com.splunk.VMWareHostConnection.init(Splunk4VMI.java:198)
at com.splunk.Splunk4VMI.init(Splunk4VMI.java:309)
at com.splunk.Splunk4VMI.main(Splunk4VMI.java:456)
You can move indexes around pretty easily now, but it’s an operation that can only be done with splunkd not running. So while a script to do it would be possible, any Splunk application (in the sense of something installed in splunk/etc/apps) would need splunkd to execute it.
It would be nice if you could write a script to “package” an existing index as an “app” which you can then easily move about.
This is not a Splunk product, it’s me playing with iPhone at home and I have not had any time to work on it recently. There are also some technical issues in integrating with the push notification service that makes publishing an iPhone client problematic (it must be uniquely tied to a particular back-end server.)
I have an input monitoring /var/log/…, with a whitelist and blacklist that are picking up the correct files.
It’s a central log server, so I’m trying to use transforms/props.conf to get it to set the hosts, but it won’t work.
I’ve tried using sourcetype in props.conf and settting sourcetype in the inputs.conf, doesn’t work.
I’ve also tried [source::/var/log/...] in props.conf, hoping it’d pick up the many different sources in /var/log, but that doesn’t work either. The default host seems to override the extraction.
I’ve read the page here http://www.splunk.com/doc/3.4.3/admin/OverrideHost
but can’t find the right way to get it working. Any ideas?
Another possibility is the complete absence of a .data file.
Each bucket should have a Hosts.data, Sources.data and SourceTypes.data.
for dir in $(find /opt/splunk/var/lib/splunk -name “db_*” -type d); do
echo -n $dir; ls $dir/*.data | wc -l;
done
You should see 3 for each line.
FmyI:
for dir in $(find /opt/splunk/var/lib/splunk -name “db_*” -type d); do echo -n $dir; ls $dir/*.data | wc -l; done
Hello,
I downloaded and installed the free splunk, I would like to use it for centralized reporting and monitoring. I downloaded the blue coat app but have no idea how to install…help please.
Thank you
I’ve had success using the following command to rebuilt the .data files from the index information:
recover-metadata
I’m not sure of all of the side-effects of this, but here are two considerations: (1) This will compleetly relplace your existing .data files (assuming the index itself isn’t corrupt; in which case you can end up with less data that you had in the first place). So you should make a backup of your .data file first. And (2), all of your hosts, sources, and sourcetypes will be in all lowercase. I wrote a small python script to guess what the original case was based on other .data files in the top-level db directory. You can see/get a copy here: http://pastebin.ca/1481049
There’s no comma there after clientip, and each value on one line. I haven’t specifically tested using it to include things (basically as a whitelist) but offhand I don’t see why it wouldn’t work. It’s a valid subsearch and you can NOT or not, as you prefer.
n.b. I haven’t tried this on 4.0, although I have no reason to expect it wouldn’t still work.
If you downloaded the application from the Splunk UI, there is a green “Install App” button.
When is the free version expected to come out? i really liked version and I just did a reinstall, but it is no longer free.
Erik, you might check out Nigel Kersten and his weblog/site for more info on Puppet. I believe he is one of the Puppet Masters. http://explanatorygap.net/category/puppet/
try http://en.wikipedia.org/wiki/Creative_destruction for more helpful examples. you don’t just want to disrupt. you want to make something new.
Funny, i ( tried ) to read Schumpeter Capitalism, Socialism and Democracy over the summer. Struggled a bit but could see the relevance for the modern entrepreneur. I should have read it more carefully since i missed the Creative Destruction point. Thanks for the tip, will try again.
There’s something else Splunk could get from Puppet: ideas for configuration syntax.
Splunk 4 overextends INI syntax by putting extra syntax inside stanza names like [host::nyc*], inside key names (MORE_THAN_80 or REPORT-) and inside key values (LOOKUP-foo = mytable userid AS myuserid OUTPUT username AS myusername)
Two years ago 2007 Splunk did some soul-searching on layered configuration.
http://blogs.splunk.com/rob/2007/10/02/software-configuration-why-does-this-wheel-need-re-invention/
The problem is that configuration is domain-specific, and “using XML” really means “building a DSL around XML syntax”. Splunk just chose to abuse INI syntax for its DSL instead of XML syntax.
I think that Splunk looks constrained by INI syntax and could do more with a Puppet-like resource syntax. See http://reductivelabs.com/trac/puppet/wiki/DocumentationStart. The language reference says “Resources are fundamentally built from a type, a title, and a list of attributes”
It would be declarative, human-read/writeable domain-specific language, two-way convertible with the more machine-readable YAML (or XML) representations. Might feel like:
extraction { “ip”:
regex => (?\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) }”,
}
props {”fooprops”:
extractions => “ip”,
check_method => entire_md5,
}
host { “*.foo.com”:
props => “fooprops”,
}
Although, I would add that if everything Splunk needs can be expressed in hand-written extended-syntax INI files there’s no point in expending resources on a more general Puppet-like resource syntax (which may not event be a good fit for the Splunk domain). Another case of if it ain’t broken don’t fix it.
And it takes much the same amount of time to get used to an INI-based DSL as for any other (such as Apache, Puppet, or XML-based one).
Or not. The puppet syntax describes resource and instantiates them, but each named resource of a given type must occur once only. Because the resource declarations are unordered, one cannot make a props{ “foo”: …} in two places with one overriding the other. This is unlike INI syntax where a stanza in one file can be made to override the same-titled stanza in another file.
May as well delete the earlier comments, the Puppet resource syntax won’t do for cascading configuration files.
It would have been nice to have a link to the download!
Hey Erik,
In your scaling model are you forwarding the same source data to all indexers or are you splitting the sources up and sending 1/N of the sources to each indexer (where N is your number of indexers)?
Is there a limitation on the size of log files that splunk free can process?
Hello
This is great! , we did purchase a 5gb enterprise license for our critical systems but this free version is a good option for our test lab. Does it comes with a real free beer coupon ;p ?
Enjoy!
So, just to understand what “free” means and what it does…
The only difference that I see between 4.0’s [disabled] features and that of 2.x-3.x is that the alerting is disabled in 4.0 as opposed to the earlier versions where the alerting functionality was included. Do I have this right?
This was a somewhat useful piece, and I’m curious as to whether the specific decision to leave this out was made for any particular reason?
And, yes, I understand the verbiage about this being for “personal” use. The problem is that for small environments where we’re expected to try to find FOSS solutions for as many infrastructure applications as possible, there’s a big step there to the least-expensive license.
(Not to sound ungrateful, of course – just want to be clear on this – we do love the product…)
Few replies in one…. ( Fernando, Nick and K.M. )
Fernando, you can always get a free beer here at Splunk. If you are ever in SF, come by, no coupon needed! We love it when users stop by for a drink.
Nick, there is no limitation on the size of files you can eat for any given day. Go ahead and index a terabyte. The limitations are that you cannot do that for 3 days in a rolling 30 day period – ya, a bit complicated. The free product is *designed* for ad-hoc use with large volumes or for lower volumes of continuous input. Feel free to bug me if that does not make sense.
K.M., it was a long process trying to decide if we can/should pull alerting out of free. We have the tricky balance of trying to provide a great free product but also make some money along the way. It was not an easy or clean cut decision and we knew that people would be bummed. I can hint that you can work around it yourself using cron or equivalent, its not as easy but then it shouldn’t be too hard either. We are committed to providing a great and useful free product and finding the right balance of features will be an ongoing challenge. Perhaps most important is your feedback – do let us know what you want in, what features we are missing all together, what does not make sense, etc. Its really hard to build a great product without lots of input
Hi Dale, sorry for the late reply.
I recommend auto load balancing using splunk forwarders. This model will split the data from (n) forwaders evenly across (m) indexers. Splitting the data evenly has huge search performance benefits. In general there are lots of advantages to using splunk forwarders at the origin and using auto LB.
If you are using syslog its a bit tricker and in that model i might still use a forwarder ( or two ) to listen for syslog but then have that fowrader auto LB across the indexers.
Not sure that helps or if i even answer the right question
If not, let me know and i’ll try again.
Hi Andrea,
Do you know if development of the iPhone app has progressed, or is it on hold due to the release of Splunk 4? This is something we could really use as a small company. There are only a handful of technical people on staff. As a result our on call schedule rotates through several people who are not devs. Having an iphone app where they can quickly look at problems would be very beneficial so they can quickly see if something is a temporary hiccup that has recovered or a real system failure.
Thanks for the thoughts Justin.
Open sourcing “Splunk” becomes complicated when you start to pull on that string – we wrestled with that decision for years when starting out. I’m not sure if I would do things differently if had to do it all over again, but the pro/con lists make it a very close call.
We are starting now to publish some of our code to the open source. I don’t know if we ever get to open sourcing the engine, indexer, forwarder, … code. We have often talked about how cool it would be to provide a database to the community – we were huge fan’s of sleepycat/berkely db and often thought that would be a good model. Targeting developers by providing a Splunk engine is a fantastic idea, i’m just not sure what the model looks like – but i know there are lots of folks out there who would use it.
I’m working on 2010 planning and will add “stuff” to look at how to get our engine out to developers better – I’m not sure if open source is the answer, but will look at all options.
Thank you very much for bringing it up, we need your input to keep us heading the right direction.
Regards,
e
Eric, great to hear! I’ve long been a fan of Splunk and often recommend it to our clients. Splunk is the only sane way to do large scale log analysis in my opinion.
A few years ago Splunk had promised an open source version. I fully understand the reasons why Splunk decided against this. However, now that you have achieved profitability and version 4.0 has been out for a few months, would Splunk consider honoring their original promise by releasing their older 2.0 or 3.0 versions under an open source license such as GPL or AGPL? Even if you held back the web interface (even though I’d love to see that opensourced as well) and only released the indexing engine, the CLI, forewarding/recieving, and API, I’m sure there are many FOSS projects and Linux distributions that would be excited to re-use your code.
I personally work on various open source projects and have longed to use spunk in them. In particular we about to start a complete rewrite of the BASE project (base.secureideas.net) and would love to replace our backend database (currently MySQL) with an opensource version of Splunk. I am also a lead developer of the Samurai-WTF Live CD project (samurai.inguardians.com) and would love to include an open source version of Splunk to collect the output of the various pentest tools. The distribution and modification restrictions of Splunk Free are simply show stoppers for these purposes.
I understand that Splunk may decide this isn’t possible, but since you run you company in such an open-to-the-community manner, I thought it was worth the effort to ask. Regardless of your decision, I will continue to be a huge fan and will always remain appreciative of your Free versions.
Hi Erik –
I arrived here google searching about how to get my saved scheduled searches migrated to 4.0. I’m a bit disappointed about the lack of search-notifications for free licenses upgraded from 3.0 to 4.0x, and wish more was mentioned in this regards in the migration considerations sections of the manual.
Perhaps you could consider in a future release retaining search notifications for users who are performing an upgrade install for 3.x, but not marketing it to any new install free customers?
Hi Paul,
It was a hard decision to remove the notification and one not taken lightly. I hope that it was obvious that the feature was not longer available, we tried hard to explain this to folks before upgrade. I’m not sure if that feature will come back to the free product at some point. In the mean time, the only recommendation i have is to use some other scheduler to run the search for you. I know this is extra work but we needed to have some difference between the free and pay for version or we would not be able to keep all the engineers working on the free product.
I hope you understand and if you have other ideas please let me know.
Regards,
e
Hi,
I’m thinking of giving this a however we are running vsphere and esx 4.0. Has any work been done to confirm what does and doesn’t work?
Cheers
Hey Ste,
I’ll be posting a new version of the VMWare app in a week or so – i’d wait.
I’lll post back here when its posted – it wont be long.
Thanks for the comment!!
e
hi there andrea:
i found this program read some about it & installed wanting to know what makes my machine tick, can i after scanning begin making my machine a performance hotrod?
what type of apps. are you working on now, or will be in the future? Can i search the entire internet web for information without being tied into a network?
so Id like to write an app also..
jimoer i ke
recover-metadata is the right solution, when:
1 – you know where the problem is (which bucket)
2 – the problem is in a real index of your data
Finding the problem is what the above posts are about, of course.
Recommendations for recover-metadata:
- you may want to copy your pre-existing .data files somewhere, in case of sadness.
- If you have a bucket that recover-metadata doesn’t work on, please consider whether you can provide it to us so we can : 1 – fix the bucket, 2 – fix recover-metadata
CAVEAT: recover-metadata likely to work poorly on the _internal index, because of its design. strings like source::foo in the event text are likely to fool it, and we have a certain amount of that in _internal. Luckily, you probably don’t care about _internal too much, as it’s only splunk logs.
err.. only splunkd-generated logfiles “splunkd is happy today, splunkd is sad today” etc.
Are you folks going to publish any samples of how to query the search app using REST?
There seems to be zero documentation for Splunk 4.0+ for REST endpoints…
David, I’m told the REST documentation is still underway.. Let me see if I can find some examples to get you going… Are you only looking for the Search API docs?
Thanks Nimish. Very helpful. New to Splunk (just downloaded and setup the free version). Can you post a screenshot of the lookup in action on the splunk page (e.g., does it show up on the Search application page as extra fields)?
T
John,
Welcome to using Splunk. After running the lookup search command, it generates a new field called country in the example above. You can see the new field from the field picker menu on the left.
Interessting, have u tested using SSD’s in raid ? Like 4xIntel SSD’s ?
This is potentially huge. Splunk begs for a fully mature query language for advanced users. The question is, is this expected to become a core part of the product or is it an experiment?
Jerry,
Thanks for your reply.
In my opinion Splunk’s query language, while difficult to learn,*is* extremely powerful and robust enough for advanced users. Is there more detail on this that you could provide? SplunkMSE’s intention is not so much to replace the existing query language as it is to allow SQL/ODBC-based tools to integrate with the Splunk data store. Think of it as an alternative API.
As far as becoming a core part of the product, the intent is to fully support this as an open source project which means as we let more people know about it, the number of features (and bugs/fixes) will go up based on feedback.
I encourage you to give it a try and let me know what you think.
Mark, I had a few comments on your blog post. First off, what company are you talking about? Tripwire? I am assuming so because of the “Log Center” comment.
Secondly, you are being a little liberal when you talk about the schema part. You are absolutely right, the user will have to know the schema of the data coming in in order to generate meaningful reports. That is the case with every single solution out there. Some have great built-in support for data sources, with ArcSight definitely leading the pack on that front. Splunk is not at all different in that regard. Splunk cannot do any magic with regards to generating reports from unknown data sources (syntaxes). You have to define your own field extractions in order to generate reports. Splunk does not support many data sources out of the box. [Except if the data source is key-value. In that case Splunk does auto-extraction].
My last point is more of a question to you. Do you really consider Splunk to be a SIEM? And if so, why?
Cheers, glad to see you hitting the ground running at Splunk!
This is one helpful article, Andrea.
Thanks! Unfortunately I haven’t had much time to continue with it. But, lucky you, you do! Welcome to Splunk!
Hey Mark,
Maybe you are “being a little liberal when you talk about the schema part” but you’re spot on when you say, “the entire SIEM market needs to change.” We agree. So if you want a demo of Tripwire Log Center I’ll be at RSA and BSides, so swing by the Tripwire RSA booth and I can walk you through the product.
What I’ve found so far is that Splunk and Tripwire solve fundamentally different problems – the result is that out of all the traditional SIEM vendors we’ve replaced, and out of all the competitive cycles we’ve been in, Splunk hasn’t come up. And when it does, one of us is in the wrong place, rather than a head-to-head battle royale ensuing.
And to that end, I’m most curious for you to respond to Raffael’s questions, “Do you really consider Splunk to be a SIEM? And if so, why?”
Thanks, and hopefully I’ll see you at RSA,
Tim | Product Marketing, Tripwire Log Center
Brief update for Splunk 4.0…
1) you do not need to do Step 2. You need to make no changes to authorize.conf.
2) do not name your search command the name of an existing command. In 4.0 there is already a “shape” command, so in the above example, change the name from “shape” to something else.
Yes – It was Tripwire’s announcement I was referring to. There are some solutions that require editing of a universal parser schema to identify log data coming from a particular data source. Exposing this universal parser to the uninitiated and allowing a user to edit it gives then the opportunity to break support for devices already dealt with by the parser.
With Splunk I don’t have to edit a schema to create a graphic indicating a timeline of actions taken by a user. I just have to create a few field extractions. A few field extractions don’t put a whole universal parser at risk. The universal parser is what I’d call a ‘brittle’ piece of the SIEM product.
I don’t look at Splunk as a SIEM for the following reasons:
1. SIEMs are focused on event management over the last few minutes, hours, or a day at most. Splunk can scale to create real metrics across days, weeks, months or longer.
2. SIEMs only provide a filtered view of the world. They give you the funnel analogy and say send only log data to the SIEM that you feel is security relevant. This pushes operational use cases into their own silo apart from Security issues. Splunk was purpose build to index any logs including multi-line custom application log data. No such scalability issues for Splunk.
3. SIEMs are built to work with a limited set of data sources. Many vendors have a supported products list that follows a rigid reactive support model. If the SIEM system supports DB2 9.0 and 9.7 comes out, Splunk doesn’t care. It simply makes any new fields available and the user can change a saved search on the fly to take advantage of the new field. That can’t always be said of SIEMs.
The more I look at Splunk, I begin to think of it more as part of an overall Pattern-Based Strategy for business that can accept many distinct business data sources that have security relevance.
From Gartner: Pattern-Based Strategy as It Applies to I&O, by Bill Malik, November 2009
“For internal pattern detection, the infrastructure and operations (I&O) team uses tools such as log analysis, performance measurement, capacity planning, trend analysis, service desk incident reporting and security incident disambiguation; see “Pattern Discovery with Security Monitoring and Fraud Detection Technologies” for a discussion of these tools and techniques. I&O may use the following vendors for pattern seeking:
* Splunk analyzes log records to detect time-sequenced patterns as an aid to problem determination and security analysis.”
I think Splunk is in a category by itself.
Hi Tina – is there a way to “Put” the date into a file name via the outputcsv command?
I’d like to have a scheduled search that outputs the results using outputcsv, but I’m afraid of overwriting the existing file. Ideally, I’d like to do something like this: ….. | outputcsv LoginData[date_Month][date_Day][date_Year]
Is something like this possible? Thanks in advance!
So two questions on this.
1. the “mycsvfile.csv” has the form of
clientip,
xxx.xxx.xxx.xxx,
yyy.yyy.yyy.yyy,
zzz.zzz.zzz.zzz,
or
clientip
xxx.xxx.xxx.xxx
yyy.yyy.yyy.yyy
zzz.zzz.zzz.zzz
or
clientip,xxx.xxx.xxx.xxx,yyy.yyy.yyy.yyy,zzz.zzz.zzz.zzz
2. Can this be use instead of an exclusion search a inclusion search
i.e. source=”/var/log/apache2/mynewdomain_access_log” [inputcsv mycsvfile.csv]
would that search work?
We also love Quicksilver but I wasn’t aware of the plugin for Salesforce, so we are definitely going to give this a go. What is missing is a plugin for Micosoft Mail merge on the Mac…definitely something that is needed. If you ever come across this drop us a line.
Thanks for posting this post, really happy to find another tool to integrate with Salesforce.
Mark,
You are clearly a product marketer. I love how you spun my question about Splunk being a SIEM and you pointed out some facets where Splunk can shine compared to a SIEM.
You, unfortunately, cast a very one-sided light. SIEMs have a bunch of advantages over Splunk: Real-time correlation, great out of the box device support (parsers), advanced reporting, pattern discovery (actual features that help you analyze patterns), real-time dashboards, advanced visualization, workflow and collaboration support, ticketing capabilities, asset modeling and correlation, features to tie events to identities or actors, and not to forget, the premise that SIEMs were built around: vulnerability correlation.
All of these capabilities are not supported out of the box by Splunk. Especially the device support is something that users of Splunk will have a super hard time getting up to par with a SIEM. Those companies have teams of 30 plus people that work on those capabilities in a full-time fashion. It’s not a simple problem!
Does Splunk work with the Sybase IQ Data Analytics software? Would Splunk compete with it or complement it?
“That’s what Splunk would say if it could talk.”
Splunk *can* talk : http://www.splunkbase.com/apps/All/4.x/app:Audible+Alerts+using+Nabaztag:Tag+(Wifi+Rabbit)
I really don’t know anything about Sybase IQ, but I can tell you a bit more about Splunk. Splunk is a search engine for time-based data such as log events, performance information, configuration changes, etc. Think of it as real-time “Google for IT Data”. It can index a huge amount of data per day and provide extremely fast “need in a haystack” type of searches. In addition it provides robust analytics **at search time** on that data. My semi-educated guess is that Splunk and Sybase IQ are apples and oranges. We have never seen it come up as a potential competitor because the use cases are probably very different.
The SplunkMSE add-on is a bridge between SQL and the Splunk Search Engine. Splunk’s native query language is much more Google-like than structured SQL with the addition of many powerful transforms and statistical operators.
I am thrilled to see Splunkbase back!
if you’re a player, you know how to protect your account. So no worry for the attacks of this hacker.
It was a great event!
Splunkdude has blogged about the event. Here you can see me: http://splunkdude.wordpress.com/2010/03/08/update-3-splunk-live-930am-to-1200pm-splunk-live/
Just one correction to your text. I’m a Splunk user since 1.0
If hacking WoW accounts is a billion dollar underground business, then I think there is a need for better internal tracking, and for players to better understand how to protect their accounts.
You’ve got to include this comic in a discussion of SQL Injection. http://xkcd.com/327/
5 Trackbacks
[...] Check out my rap song called “Splunk IT” [...]
[...] the guessing from timestamp extraction, line breaking, sourcetyping. For your convenience, these 3 topics are covered separately in my [...]
[...] Splunkbase! | Splunk Blogs [...]
[...] in nature is about who is calling who and who else did the initial recipient call. Splunk’s transaction search command can be used to group similar records to provide law enforcement this critical data to carry on [...]
Спасибо …
Между нами говоря, по-моему, это очевидно. Вы не пробовали поискать в google.com?…