thebaumblog: virtualization

Chad’s Army

I stumbled upon this unexpected post from Chad Sakac of EMC talking about the VMware/EMC/Cisco collaboration.

For anyone who has spent their career on the start-up track in Silicon Valley this is not a novel story.

Isn’t it fantastic to see some large companies still have the mojo of entrepreneurship and fast moving initiatives that survive outside of the normal organizational structure?

While it remains to be seen how successful VCE, Acadia and Vblock will be, it sure is exciting to have the industry talking about radically new approaches to simplify computing! Here is a great post summarizing Vblock from Mark Bowker @ Enterprise Strategy Group. Now if we can only get access to that lab and get Splunk running on one of those Vblocks … hmmmm.

Splunking VMware virtualization at VMworld

This week things were rocking and we were splunking at VMworld. VMware launched their road map for their Virtual Data Center Operating System (VDC-OS). VDC-OS is VMware’s vision to aggregate virtualized servers, storage and network resources into a common platform that manages resources for guest operating systems and applications. And we launched Splunk for VMware. It’s an application build on top of Splunk that gathers data from from different levels of the VMware virtual stack including the hypervisor configuration, metrics and events, the host operating system, underlying network and guest OS and applications. The application also gives you predefined searches, alerts and reports to troubleshoot and secure your VMware environment. It’s free and you can download it here.

VMware VDC and Splunk for VMware

VDC-OS represents a big leap forward in managing the complexity virtualization hoists upon us. Finally vendors like VMware and Microsoft (will soon ship their own System Center Virtual Machine Manager) admit managing complex combinations of virtual resources is difficult and important. This is great for monitoring the hypervisor and virtual guest sessions, but what about the resident guest operating systems or applications? Its still impossible to correlate activity and performance at an application level with resource utilization and performance down to the bare metal

While these vendors are focused on deploying and tracking the resources themselves, Splunk focuses on providing visibility into the complex interactions and dependencies within a virtual infrastructure. Splunk finds, collects and persists the otherwise perishable log, event and configuration data from dynamic virtual instances as they come and go. Splunk correlates data across tiers in the virtual stack — both inside and outside the hypervisor and guests including the physical servers, hypervisor, VMs, and deployed applications,.

When you point your web browser to the Splunk for VMware application you’ll notice several dashboards already created.

  • VM Metrics Dashboard - a view of the last hour’s memory and CPU utilization across all running VMs so you can pinpoint hot spots.
  • VM Status Dashboard - current configuration, available storage and other key status indicators from different tiers including hypervisor; access & weblogic logs from deployed applications within the guest OS; perfmon, ps and top from the guest OS’s.
  • VM Searches Dashboard - all searches, alerts and reports included with Splunk for VMWare.

You’ll see on the searches dashboard a number of investigation searches that correlate the VMWare API data with OS data from within the guests to perform complex investigations in a single step. This dashboard also shows you the details of predefined alerts like looking for guests with heartbeats, looking for storage capacity problems, and other common issues.

As concepts like VMware’s VDC-OS become reality (some time in 2009 according to VMware) having the ability to trace transactions through a virtual infrastructure will become even more important. Every layer of management and abstraction (and yes that’s what virtualization is) means more complexity to manage. Just as with previous VMware products, VDC-OS will not manage physical hardware that has not been virtualized. And understanding how the virtual infrastructure is interacting with non-virtualized servers, storage and networks will remain a critical requirement.

Check out Splunk for VMware and let us know what you think and how we can continue to build on it together.

Ode to Log Management

I love “log management.” I hate log management.

I love log management because years ago it was the impetus for IT to move beyond simple SNMP monitoring to collecting and trying to understand a much richer set of data about complex environments.

I hate log management for over the years it has been co-opted by vendors and analysts who’ve pigeon holed it into yet another IT management silo. These vendors and analysts have narrowly defined log management as the collection and storage of logs in some locked repository used to generate static reports to satisfy regulators, auditors and IT governance boards.

Why am I so bitter?

First it turns out logs are critical to many other stakeholders in the enterprise. Operations needs real time access to logs in order to find and fix problems and improve mean time to recovery (MTTR). Security needs logs to catch bad guys. Business people need logs to understand customer and service behavior and provide service level measurements. So locking up logs in a static repository designed for one constituency severely limits their value and diminishes the return on investment not only in a log management solution but also the return on your IT assets overall.

Secondly logs alone don’t provide anyone of the IT stakeholders with a complete picture.

Let’s take a simple example right from the hottest compliance use case today — PCI. The Payment Card Industry (PCI) Security Standards Council founded by American Express, Discover Financial Services, JCB International, Mastercard and Visa has outlined requirements for security management, policies, procedures, network architecture and software design. If you are a merchant accepting credit or debit cards and you process more than 20,000 transactions per year there are twelve specific requirements. Failure to comply with the requirements is not an option. You can be fined heavily and you can lose your ability to accept credit and debit cards.

One of the twelve requirements is the commitment to monitoring and investigating changes to configuration and password files for any application, server or device involved in the processing of card holder information and transactions. In the case of file content, permissions or attribute changes, logs will only tell me part of the story. Yes a Windows, Linux or Unix log will tell me a file has been changed but it won’t tell me who changed it. It also won’t tell me if the change was authorized or not. To understand who changed a file I need to look at the other user processes running on that server at the same time the file was changed. What user processes were running and who owned them? In Unix or Linux this information is easily viewed with a simple “ps” or “top” command but doesn’t exist in any log. In order to understand if the change was authorized or not I need to compare the log and file change information with the user information and any tickets from the service desk authorizing this user to make this type of modification.

The real reason I believe we need to move on from talking about log management is log management isn’t a market. It isn’t a solution. It is a feature in a much broader landscape of harnessing all the data being generated by our IT infrastructures.

Turning all that data info information for every stakeholder is important to the future of IT as environments grow more complex, dynamic, service oriented, virtualized and mission critical. Not just to report on compliance controls, but to improve our speed of root cause analysis, increase our ability to quickly and comprehensively investigate security attacks and develop more intimate relationships with our customers by better understand their behavior and providing a transparent view of the services they are receiving in return.

Doom and Gloom Everywhere But Here

The US economy is heading into a recession and technology spending is in for a steep decline in 2008. So every major prognosticator and news outlet from the Wall Street Journal to the Financial Times would have us believe.

Are these people watching the same movie I am? There are two problems I have with this economic hyperbole. Yes that’s what it is. I guess it sells newspapers and gets people to watch things like CNBC. But boy is it misleading.

First of all, in macroeconomics, a recession is a decline in any country’s gross domestic product (GDP), or negative real economic growth, for two or more successive quarters of a year. Yet nobody that I’ve read is forecasting negative growth. They’re forecasting a potential slow down in growth from the current 3.5% per quarter to 1.5 to 2.5% per quarter. But the news outlets feel compelled to use the “R” word just to get attention. Totally irresponsible.

On to my second gripe. With regards to technology and IT spending, I believe, based on what I see, we are in beginning of a long-term gradual increase in IT spending within large enterprises that started eighteen to twenty four months ago.

Sure the current credit crisis may have a short-term impact on budgets within Financial Services companies, but I don’t see any slow down yet. The major consumer, commercial and investment banks we work with have so many critical, revenue generating IT projects in backlog I fail to see how spending is going to slow at all. The telecommunication sector is finally back on the mend after the post early 2000’s bubble and hangover.

Social media, online shopping and the always on dimension of the Internet have online services and large Internet sites like MySpace and Amazon accelerating software, hardware and services spending just to keep up. And security, privacy and compliance initiatives and mandates have companies, service providers and government agencies increasing spending on these items by some 20% or more in 2008 to try and limit their exposure and risk.

Just a month ago the Financial Times had a great piece entitled “What’s on CIO wishlists?” Here’s a quick summary.

1. Business alignment and strategy
2. Hiring and retaining the best staff
3. IT innovation/new methodologies
4. Security
5. Collaboration technologies
6. Controlling costs
7. Compliance and regulation
8. Virtualisation
9. Customer service
10. Mobility (Green issues came 11th)

Doesn’t look like a slow down to me.