thebaumblog: Archive for September, 2008

Splunking VMware virtualization at VMworld

This week things were rocking and we were splunking at VMworld. VMware launched their road map for their Virtual Data Center Operating System (VDC-OS). VDC-OS is VMware’s vision to aggregate virtualized servers, storage and network resources into a common platform that manages resources for guest operating systems and applications. And we launched Splunk for VMware. It’s an application build on top of Splunk that gathers data from from different levels of the VMware virtual stack including the hypervisor configuration, metrics and events, the host operating system, underlying network and guest OS and applications. The application also gives you predefined searches, alerts and reports to troubleshoot and secure your VMware environment. It’s free and you can download it here.

VMware VDC and Splunk for VMware

VDC-OS represents a big leap forward in managing the complexity virtualization hoists upon us. Finally vendors like VMware and Microsoft (will soon ship their own System Center Virtual Machine Manager) admit managing complex combinations of virtual resources is difficult and important. This is great for monitoring the hypervisor and virtual guest sessions, but what about the resident guest operating systems or applications? Its still impossible to correlate activity and performance at an application level with resource utilization and performance down to the bare metal

While these vendors are focused on deploying and tracking the resources themselves, Splunk focuses on providing visibility into the complex interactions and dependencies within a virtual infrastructure. Splunk finds, collects and persists the otherwise perishable log, event and configuration data from dynamic virtual instances as they come and go. Splunk correlates data across tiers in the virtual stack — both inside and outside the hypervisor and guests including the physical servers, hypervisor, VMs, and deployed applications,.

When you point your web browser to the Splunk for VMware application you’ll notice several dashboards already created.

  • VM Metrics Dashboard - a view of the last hour’s memory and CPU utilization across all running VMs so you can pinpoint hot spots.
  • VM Status Dashboard - current configuration, available storage and other key status indicators from different tiers including hypervisor; access & weblogic logs from deployed applications within the guest OS; perfmon, ps and top from the guest OS’s.
  • VM Searches Dashboard - all searches, alerts and reports included with Splunk for VMWare.

You’ll see on the searches dashboard a number of investigation searches that correlate the VMWare API data with OS data from within the guests to perform complex investigations in a single step. This dashboard also shows you the details of predefined alerts like looking for guests with heartbeats, looking for storage capacity problems, and other common issues.

As concepts like VMware’s VDC-OS become reality (some time in 2009 according to VMware) having the ability to trace transactions through a virtual infrastructure will become even more important. Every layer of management and abstraction (and yes that’s what virtualization is) means more complexity to manage. Just as with previous VMware products, VDC-OS will not manage physical hardware that has not been virtualized. And understanding how the virtual infrastructure is interacting with non-virtualized servers, storage and networks will remain a critical requirement.

Check out Splunk for VMware and let us know what you think and how we can continue to build on it together.

Splunk in the fast lane. Welcome Godfrey!

Things are moving pretty fast at Splunk and I wanted to comment on the exciting news we announced last week.

In 2004, myself, Erik Swan and Rob Das started Splunk with a vision to battle IT complexity by embracing it. We were thinking of things a bit differently. A different way to address the management of IT by applying search to millions of data center artifacts. Traditionally these artifacts were summarized, filtered and reduced and then forgotten - leaving us humans in a pickle when we needed to figure out what’s really going on. For us Splunk was also about a different way to interact with the market taking an approach of utter transparency. Our public product road maps, freely downloadable software and straightforward marketing had even our early stage venture capital investors thinking we were crazy.

By start-up standards, we seem to have succeeded. Splunk now has more than 250,000 user downloads, more than 750 enterprises, service providers and government agencies worldwide as paying customers and a growing list of partners who embed Splunk into their software, hardware and managed services including companies like Cisco and British Telecom. According to my venture capital friends, very few start-ups make it to where we are today. But, fueled by a love for innovation and so many passionate users we’ve challenged ourselves to see beyond achieving success as a start-up. We believe Splunk can be a company that gets the IT industry thinking differently.

Creating change isn’t easy and we’ll need all the help we can get. Fortunately, we’ve been blessed with an ability to attract top talent at all levels. But our most recent success tops them all. Godfrey Sullivan has joined us as our new President and CEO. When you meet him you’ll realize the incredible passion he has for building great companies. Most recently he was President and CEO of Hyperion Solutions. He took Hyperion over a period of six years to $1B in revenues. Hyperion was acquired by Oracle in 2007 for $3.3B. Godfrey also serves on the board of directors of Citrix Systems, Inc., and Informatica Corporation. Just as important as his business and leadership abilities, Godfrey has the cultural DNA that fits right in at Splunk.

Here’s the yin and yang that is Godfrey. He owns one of only 4,038 1994-1997 Ford GTs. Now this thing is fast, really fast.

  • 0–60 mph (0–96 km/h): 3.3 seconds
  • 0–100 mph (0–160 km/h): 7.3 seconds
  • Standing 1/4 mile: 11.2 seconds @ 134.2 mph
  • Top speed: 212 [11]

And his other car is a Toyota Prius. Enough said.

Godfrey couldn’t join us at a better time. We’re scaling all aspects of the business and need the leadership of someone who’s been through this type of explosive growth before. For me personally, it’s pretty cool to work beside someone of his experience, talent and steady as she goes outlook on life.

And I get to continue to do what I do - build things. I’m now leading the team building our partner ecosystem working with Developers, MSPs, Resellers, Technology Partners and System Integrators around the world.

Of course this hyper growth wouldn’t be possible without your passion and support. Thank you all for that.

Happy Splunking!

Life after SIEM. Situational Awareness is next.

We’ve been hearing a lot lately about the death of SIEM technologies. But isn’t the question less about a legacy technology dying and more about the dimensions on which the next mass adopted security capability will be born? Clayton Christensen first described a model for disruptive technology in his book The Innovator’s Dilemma and his follow on The Innovator’s Solution. Christensen describes a theory about how disruptive technologies over take sustaining technologies by delivering value on new dimensions that established vendors overlook as unimportant, low end or just don’t think about because they’re too busy improving their legacy. Christensen’s work offers an interest framework to think about what’s taking place in the market for SIEM security management solutions.

Any enterprise trying to secure their IT infrastructures knows the state of the art in SIEM security approaches falls short. And trends like virtualization are making things even more difficult. System and security administrators and analysts are inundated with too many potential incidents and its too difficult and time consuming to investigate even a fraction of them. Achieving a greater comprehension of the meaning of potential incidents and the projection of their status in the near future is the real goal. The idea, called “situational awareness” is often, however, impossible to achieve. We are so dependent on pre-programed rules in our SIEM solutions that we lack the ability to perform our own analysis because the original raw data has been filtered out, thrown away or we have no practical way to make sense of it.

Observation: If the technology is sufficiently complex as to allow the vulnerability to exist, can we really build complex technology to catch all the possible issues or scenarios?

As a reference point see David Hazekamp, Security Architect at Motorola, talk about the importance of retaining all security data across the Motorola global SOC infrastructure and integrating access to all this data into existing SIEM solutions.

Of course reaching this understanding requires one suspends their disbelief about the effectiveness of current SIEM security technologies. Usually this means you’re not a vendor or you’re a vendor with little or no vested interest in current approaches. So with this let’s examine the typical enterprise deployment of security technologies.

Defense in Depth

This is where every good enterprise security architecture starts. In order to begin securing your environment you’ve got to have data, raw data. In most data centers this takes the form of syslog from network devices and servers, SNMP traps, OPSEC or LEA interfaces for firewall events, WMI for Windows desktop and server events, IDS and IPS signature scans and application level firewall examination of common services like FTP, HTTP, SFTP, SCP etc. The thinking is you need to look at everything. Perhaps you’ll even want to pull in information from physical security systems like badge readers.

Security Information Management (SIM)

The next step in the process is to manage all this raw data and filter it down to a manageable number of events, traps and alerts. Collecting, storing and providing some basic analysis on all this data is the job of a SIM. Typically, as Raffy points out, the data is parsed, normalized and stored in a structured RDBMS. Parsing, normalizing and structuring all this data is great if the data doesn’t change or you don’t have too much of it. But if you’re dealing with data formats that aren’t static or you’re trying to store terabytes of this data an RDBMS won’t be your friend.

Security Event Management (SEM)

Once a SIM has done it’s job you’re ready to aggregate, correlate and start reporting on potential incidents using a SEM to do the job. SEM’s usually consist of lots of rules that look for combination and patterns of events indicating that a possible attack or breach may be underway. Essentially the SEM rules attempt to codify what we humans know about vulnerabilities in our IT systems and possible ways to exploit them. The goal is to provide some real-time information usually in the form of reports, dashboards and visualizations to operations and security analysts who work to keep the infrastructure secure.

Situational Awareness (SA)

SIEM correlation can be interesting for discovering a pattern or related event but the ability to work an issue outside of these “canned” rules and events becomes the real problem. Unfortunately, what all to often happens is there are so many possible attacks, operations and security staff are overwhelmed with potential incidents to investigate and not every event or pattern of interest is going to be discovered via the pre-built rules. Situational awareness is the attempt to perceive environmental elements within a volume of space and time. Comprehension cannot be achieved if the data being bubbled up is filtered according to a set of rules and the technology does not allow a human to perform their own analysis of the raw data as generated by the environment itself. All technologies have their weaknesses and those that perform correlation are no different.

Thus whilst canned SIEM correlation provides value in bubbling things up — we still need the ability to dig into the raw data to fully perceive and comprehend what is taking place. Now mind us all SA is not a new concept. It has been applied rather robustly by decision-makers in complex, dynamic areas from aviation, air traffic control, power plant operations, military command and control — to more ordinary but nevertheless complex tasks such as driving an automobile or motorcycle. And yes it has been mentioned before in security operations, particularly in government agencies.