All the Data That’s Fit to Visualize – SOURCE Boston 2008
I was giving a talk at SOURCEBoston 2008. The topic this time was around general visualization and what has gone wrong in security visualization in the past. I showed how we can learn and steal from other disciplines, in this case, the New York Times. The NYT has done some pretty fantastic work in the area of data visualization. Their interactive market map, for example, is a great way of exploring stock data. During the talk, I outlined some of the design principles that the NYT graphics department is using when they are designing their graphs: Show – Don’t Tell.
To start my presentation, I showed a little video about security visualization (see below).
Splunk for Virtualization
I’m looking for some help.
I’ve built a VMWare app for splunk and in the process of doing the same for Xen. These Apps use the VMWare and Xensource API’s to index everything about the VM environment. When combined with splunk instances running within the guest OS you get a very comprehensive historical picture. I’m curious are there any splunk customers out there using VMWare or Xen? I’m looking for usecases so that i better understand how to configure the apps. I’d be curious to know what types of information would be useful to capture and what types of searches would one want to perform. Both Xen and VMWare have so much data available that configuration could be complicated. I’m trying…
The Splunk Python client library (part 1)
Splunk 3.2 introduces a publicly available Python client library that allows external developers to programmatically interact with Splunk by importing a few key modules.
The easiest way to get started with the client library is to get into Splunk’s Python environment. Locate your Splunk install directory (/opt/splunk by default), and start the python interactive shell that comes with Splunk:
# bin/splunk cmd python
This will launch the interactive Python prompt, which starts off looking like this:
Python 2.5.1 (r251:54863, Nov 18 2007, 16:13:41)
[GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
Starting a search
Import the Splunk modules:
import splunk.auth
import splunk.search as se
If you have installed Splunk with the default settings, then your hostpath is https://localhost:8089. The client library knows…
P-Camp preso on automating product management with Jira
Here’s the presentation that I gave this past Saturday at P-Camp, the unconference for product managers. If you’ve been following what we’re doing here with automating product management using Jira, there’s detail and screenshots in this presentation that might be interesting.
Digging into metrics.log
Occasionally people ask for help in identifying a rogue data input that is suddenly spewing events. If it’s hidden in a ton of similar data it can be difficult to sort out which one is actually the problem. One place to look is the Splunk internal metrics.log. You can find it by searching the internal index (add “index=_internal” to your search) or just look in the file itself (located in $SPLUNK_HOME/var/log/splunk.)
Before I get into what can be found there, I need to explain what metrics.log is not. It is a sampling over 30 second intervals, so it will not give you an exact accounting of all your inputs. For each type of item reported, you get the top ten hot…
6000 Harvard applicants’ personal data on Bittorrent
Harvard just learned security investigation 101 the hard way.
Harvard admitted yesterday that a web server was hacked a month ago that contained financial application data for over 10,000 applicants. They knew about the incident on February 15 and took down the server till February 21 in order to investigate and implement stronger security controls. Their announcement reveals how slow and ineffective security investigations often are.
“The University’s initial examination did not reveal the full extent of the hack. As the investigation continued, it became apparent that some sensitive applicant data, including Social Security numbers, could potentially have been accessed.”
Unfortunately, a day later, it was pretty obvious that over 6,000 applicants’ data had been compromised – CNet reports that all their personal data…
The Splunk Platform Has Launched
Without a doubt the past week has been the most amazing week in Splunk history. The crazy coast to coast multi-city launch left us all exhausted and electrified. A few of the things that stick in my mind…
First Splunk 3.2 including Splunk for Windows went live on our download page last Saturday and more than 40% of our downloads in the past week have been for our new Windows version. Then Nick Selby of 451 Group wrote an analyst brief on us. He said, “Splunk is awesome: it’s multiplatform, easy to install and easy to use. And with an abstraction layer of logs, configuration files and system messages, traps and alerts, it’s seriously useful.” 451 has a reputation for ripping vendors, so we’re…
Using the Atom Feed Format in Enterprise Software
XML is a great format for exchanging information because it balances readability, extensibility, and compatibility across heterogeneous environments. However, its flexibility is also a disadvantage because it is far too easy to create a proprietary XML schema, resulting in lots of custom code to interface with various systems. Lots of custom code leads to brittleness, and brittleness leads to frustration. The key to salvation lies in standardization.
Enter the Atom standard: a standards-track schema that defines a generic collection/item container format in XML. Most people equate Atom to an RSS competitor, which is true, but that only covers half of what it does. The Atom Publishing Protocol is a well-defined protocol for performing CRUD (Create, Read, Update, Delete) operations on items…
Splunk Replay: Search results in motion
Inspired by glTail.rb and Digg Lab’s Stack, Splunk Replay is an animated data visualization that “replays” search results as a simulated event stream. The application displays events at a rate proportional to the times at which the events originally occurred.
Each event is represented by a single square particle that flows from its place in a legend of values to its corresponding position in a stacked column chart. Upon landing in the column chart, one of the event’s fields is output in a readable format below the chart. Both the legend of values and the stacked column chart retain the order of their values according to a configurable comparator and truncate older values to make space for new ones. Rolling your mouse over…
Common Event Syntax
As part of the common event expression (CEE) effort, a list of field names has been published.
If log records from different log sources have to be correlated or reports have to be generated across different log sources, a common set of field names is needed. Take a firewall log example. Assume that you have two types of firewalls in your environment: Netscreen and PIX. Both devices write different types of log entries. Assume you have a parser that extracts fields from the two logs. Each of the parsers might call fields differently, making it either impossible, or really hard to correlate these two log files. Just think about reporting. How do you find the top source addresses across both logs? These…















