thebaumblog: Business Intelligence

Splunk Live New York 2009

This week we’re on the East Coast enjoying some fantastic customer presentations and roundtables at Splunk Live events in New York City, Princeton NJ and Washington DC. It’s Tuesday and we have more than 100 customers and Splunk users attending Splunk Live in midtown Manhattan. The vibe is electric as we’re being treated to awesome talks by IDT and New York Life. At lunch, long-term customer’s Bloomberg and AT&T joined the customer roundtable conversation.

Gabe Arnett, Senior Software Architect at Moody’s demonstrated how Splunk is being used to monitor and troubleshoot the Moody’s Analytics platform. Gabe has more than 15 years of building web applications in financial services, investment banking and e-Commerce. At Moody’s he’s responsible for global development team that develops and supports the newly re-designed client facing website – v3.moodys.com. Moody’s is a leading provider of research, data, analytic tools and related services to debt capital markets and credit risk management professionals. The company’s products and services provide the means to assess and manage the credit risk of individual exposures as well as portfolios; price and value holdings of debt instruments; analyze macroeconomic trends; and enhance customers’ risk management skills and practices.

Moody’s Splunk environment is utilized by 25 different users and runs on Windows 2003. Splunk provides Gabe’s developers secure access to the logs they need without touching the production devices, servers and applications. His team has built custom searches and a number of dashboards indicating the general health of their applications and service. Custom searches and alerts provide alerts to track errors and access – guaranteeing good user experience. The team also uses Splunk to understand when and where new content isn’t flowing to the v3 platform. A large part of the Moody’s user experience is delivering email alerts and Splunk helps the team track GUIDs to ensure customers receive the alerts they’ve subscribed to.

The team recently migrated from Splunk 3 to Splunk 4 – taking 30 minutes to perform the upgrade. The Splunk for Windows App has been significantly revamped in Splunk 4 and the Moody’s team is making use of it to monitor through WMI local server resources (disk, memory, networking) and correlate this performance data with the Windows and Application event logs.

Shay Benjamin, CSO and SVP, Architecture at IDTdesigns and implements network architectures and manages compliance, security and fraud initiatives at IDT. IDT Corporation (www.idt.net) is a holding company focused on the telecommunications and energy industries. Since 1995 they’ve been building hundreds of VOIP switches globally and assembling an international fiber optic network. IDT pioneered VOIP (Voice over Internet Protocol) to create Net2Phone, piloted the first commercial WiFi phone service in the US and has created a prepaid calling card business, which sells 12 million calling cards a month.

IDT uses Splunk primary for VOIP Call Detail Records (CDRs). The company indexes more than 120 million CDRs per day with six mirrored Splunk server instances. Call Detail Records (CDRs) are somewhat like logs, but with many fixed delimited fields . One or more CDRs are created at each switching or routing point for every VOIP call. CDRs vary between platform devices in number of fields and contents and unlike logs, few CDR fields contain easy-to-read key=value pairs. Although a key piece of maintaining service quality, billing, monitoring network quality and security forensics, working with CDRs is labor intensive and delay wastes labor, time and money.

IDT needs fast searches across all fields of the CDRs and quick data loading – to allow fast retrieval of call data and cross platform searches to unify results from different CDR formats. Historically IDT utilized a custom RDBMS solution with an application called Call Genius. In their RDBMS IDT was forced to limit the fields that get indexed because indexing of CDRs with an RDBMS is costly as it takes up a lot of space and slows load times. The RDBMS also only indexes fields common to multiple platform’s CDRs. In the RDBMS solution much of the CDR data was put into BLOBs (actually CLOBS) – multiple CDR fields mapped into a single RDBMS field to try and achieve efficiency. But Blobs can be very difficult to search and are difficult to index effectively. The legacy Call Genius application didn’t permit the search of CDR BLOBS.

Now IDT utilizes Splunk to index all CDR fields. No need to decide what fields to index and cross platform searches are easy without losing specific platform CDR format resolution. There is no longer a need to create BLOBs for efficiency. Engineers and support staff are able to quickly search for any combination of

  • Phone Number
  • IP address
  • Trunk Group Name

Splunk naturally and easily links search terms across fields and the users just need to enter the phone number or IP and get back the CDR events and transactions.

Comparing Splunk to the RDBMS solution IDT found searches to be 50 to 100x faster on non-indexed RDBMS data. Indexed fields are also faster in Splunk than in the previous RDMBS solution. Splunk load times for a typical sample average 1 to 5 minutes versus the 20-40 minutes for the RDBMS.

IDT is in the process of feeding firewall, security, router, IP network, and switch data in into Splunk as well. They’re already discovering Splunk is finding errors not captured by Network Management Consoles and has provided valuable troubleshooting during recent datacenter migrations.

Most of all IDT is looking forward to discovering new ways to use all the data in Splunk. Heuristic analysis and Business intelligence applications are on the top of their list including the use of Splunk to find human “Family and Friends” networks and drive the development of new commercial programs.

Splunk Live Southwest 2008

This week we’ve been moseying through the Southwestern part of the US with our Splunk Live show. We changed up the format a bit with Splunk technical workshops in the morning and customer round tables in the afternoon. The technical workshops were a big hit with more than 200 people registered to engage with our Splunk Experts. During the workshop you were able to download, install, configure and start using Splunk on your laptop or server with remote access. The best part about Splunk Live events though is sharing ideas with other Splunk fanatics.

Ryan Peterson from Infusionsoft, a marketing automation company, gave a great talk in Scottsdale about his Splunk deployment for the company’s email infrastructure. Ryan is tasked with keeping more than 12M emails a week flowing out of the system to support Infusionsoft’s Automated Follow-up Technology (AFT). Ryan has multiple servers in different geographies in addition to PCI Compliance requirements. He demonstrated using Splunk to troubleshoot problems spread across the messaging infrastructure, address reporting inaccuracies and deliver PCI reports to auditors. He’s even indexing the content of email with Splunk using a scripted LDAP data input. Cool stuff.

In San Diego Tony Doan of the Genomics Institute at the Novartis Research Foundation (GNF) and Eric Van Johnson from Sony Consumer Electronics joined us. Tony is a security engineer and former pen tester. He also confesses to be a recovering Unix sysadmin. GNF has 600 Windows desktops and several hundred Windows and Linux servers supporting the discovery of new biological processes and improved human therapeutics. Tony discussed how they splunk Cisco CSC, Bluecoat, Symantec AV, Arpwatch, Cisco Switches and Wifi access points to find what he calls “previously unknowns” to improve operational availability and security. He says they’re finding new uses everyday but Tony’s favorite is splunking Cisco IPS and Cisco MARS events looking for odd behaviors. Next up for GNF is eating Windows Event Logs and Windows Registry inputs together with summary indexing for consolidated reporting.

Eric Van Johnson is the eServices Hosting and Operations Manager at Sony Consumer electronics. He led an great discussion on splunking IBM Websphere and MQ Series events including how Sony has integrated operations and development environments to identify problems with complex apps more quickly and avoid unnecessary escalations to the development team. He shared with us Sony’s roll out of Splunk to their Business Intelligence Group. The idea is to complement aggregated WebMethods data reporting for business activity monitoring. Next up he wants to feed Splunk data back and forth with Verizon’s hosting operations since some of the Sony servers are hosted at Verizon and Verizon is also using Splunk.

In LA Rich Horace, Director of Systems Engineering and Operations at Fox Interactive Media demonstrated how Fox uses Splunk in the Fox Audience Network. Basically these are the guys that serve web advertisements across all the Fox properties including MySpace, Rotten Tomatoes, Fox Sports and IGN. He’s challenged with launching new monetization platforms and keeping the existing ones running. Rich gave a fantastic overview of his Splunk installation which consolidates/aggregates data form disparate systems in order to protect against hackers and meet PCI and SOX requirements. He currently runs an environment with ~600 Linux servers, load balancers, servers, NetApps and network switches. So far he’s indexed 1.5B events. We engaged with everyone in a lively discussion about securing production sites from developers and controlling and auditing access to data using Splunk’s access controls and search filters. Rich also discussed how Fox is using Splunk to integrate with various Citrix products including Netscaler and XenApp.

Thanks to everyone who shared their stories with us this week, it was really awesome.

Ode to Log Management

I love “log management.” I hate log management.

I love log management because years ago it was the impetus for IT to move beyond simple SNMP monitoring to collecting and trying to understand a much richer set of data about complex environments.

I hate log management for over the years it has been co-opted by vendors and analysts who’ve pigeon holed it into yet another IT management silo. These vendors and analysts have narrowly defined log management as the collection and storage of logs in some locked repository used to generate static reports to satisfy regulators, auditors and IT governance boards.

Why am I so bitter?

First it turns out logs are critical to many other stakeholders in the enterprise. Operations needs real time access to logs in order to find and fix problems and improve mean time to recovery (MTTR). Security needs logs to catch bad guys. Business people need logs to understand customer and service behavior and provide service level measurements. So locking up logs in a static repository designed for one constituency severely limits their value and diminishes the return on investment not only in a log management solution but also the return on your IT assets overall.

Secondly logs alone don’t provide anyone of the IT stakeholders with a complete picture.

Let’s take a simple example right from the hottest compliance use case today — PCI. The Payment Card Industry (PCI) Security Standards Council founded by American Express, Discover Financial Services, JCB International, Mastercard and Visa has outlined requirements for security management, policies, procedures, network architecture and software design. If you are a merchant accepting credit or debit cards and you process more than 20,000 transactions per year there are twelve specific requirements. Failure to comply with the requirements is not an option. You can be fined heavily and you can lose your ability to accept credit and debit cards.

One of the twelve requirements is the commitment to monitoring and investigating changes to configuration and password files for any application, server or device involved in the processing of card holder information and transactions. In the case of file content, permissions or attribute changes, logs will only tell me part of the story. Yes a Windows, Linux or Unix log will tell me a file has been changed but it won’t tell me who changed it. It also won’t tell me if the change was authorized or not. To understand who changed a file I need to look at the other user processes running on that server at the same time the file was changed. What user processes were running and who owned them? In Unix or Linux this information is easily viewed with a simple “ps” or “top” command but doesn’t exist in any log. In order to understand if the change was authorized or not I need to compare the log and file change information with the user information and any tickets from the service desk authorizing this user to make this type of modification.

The real reason I believe we need to move on from talking about log management is log management isn’t a market. It isn’t a solution. It is a feature in a much broader landscape of harnessing all the data being generated by our IT infrastructures.

Turning all that data info information for every stakeholder is important to the future of IT as environments grow more complex, dynamic, service oriented, virtualized and mission critical. Not just to report on compliance controls, but to improve our speed of root cause analysis, increase our ability to quickly and comprehensively investigate security attacks and develop more intimate relationships with our customers by better understand their behavior and providing a transparent view of the services they are receiving in return.

The Splunk Platform Has Launched

Without a doubt the past week has been the most amazing week in Splunk history. The crazy coast to coast multi-city launch left us all exhausted and electrified. A few of the things that stick in my mind…

First Splunk 3.2 including Splunk for Windows went live on our download page last Saturday and more than 40% of our downloads in the past week have been for our new Windows version. Then Nick Selby of 451 Group wrote an analyst brief on us. He said, “Splunk is awesome: it’s multiplatform, easy to install and easy to use. And with an abstraction layer of logs, configuration files and system messages, traps and alerts, it’s seriously useful.” 451 has a reputation for ripping vendors, so we’re flattered.

Dana Gardner, analyst with Interarbor wrote a very eloquent analysis of our platform launch on ZD Net. “Splunk has created the means to offer developers easy access to that data and the powerful inferences gleaned from comprehensive IT search. That means the data can go places no log file has gone before,” says Dana. Developers are certainly doing some way cool things with Splunk.

I’ve seen a couple of neat visualization applications including this one called Replay. It shows you a live or time lapsed view of your event streams. Here you can see the replay application hooked up to our internal wiki showing who’s doing what over a 24 hour period. Click on the image for the movie.

replay.png

As for our own applications, the Splunk for PCI app drew tremendous interest at our series of Splunk Live events this past week. It’s just one example of how a business person with domain knowledge can package their own Splunk configuration as an application. If you haven’t seen Raffy’s video on the PCI Application, check it out here.

pci.png

We also showed the Splunk for Change Management application as well. Seeing someone touch a file and watching the Splunk dashboard update instantaneously is an awesome display of how flexible Splunk has become. Check out the developer program for yourself and get your goods up on SplunkBase so we can all check em out.

changemgmt.png

Welcome!

I’m Michael Baum. Welcome to my blog.

I hope to find time to write about some of my favorite topics including:

  • Splunk and IT Search.
  • Technology gadgets and software — the stuff we all like to use.
  • Datacenter applications, servers, networks and security — the stuff we all have to keep running.
  • Business, entrepreneurship and venture capital.
  • Wall street and investing.

Comments are always welcome and you can also reach me via email at thebaum (at) splunk (dot) com.