thebaumblog: Archive for September, 2009

Splunk Live Washington DC 2009

Obama-nomics is highly visible in our nation’s capitol these days. The DC economy is humming as our tax dollars are hard at working fueling all kinds of government spending.With more than 100 attendees at Splunk Live on Thursday we certainly were not disappointed in our quest to help make all this growth in government more efficient! Managing large networks and security forensics were the hot topics of conversation at Splunk Live Washington, DC where everyone was treated to a trio of three incredible speakers.

Our first speaker was Andy Purdy, the Co-Director, International Cyber Center, George Mason University and the Former Acting Director, National Cyber Security Division (NCSD) and US-CERT Department of Homeland Security. Andy was a member of the White House staff team that drafted the U.S. National Strategy to Secure Cyberspace (2003) and served on DHS tiger team that formed the National Cyber Security Division (NCSD). He was 3 1/2 years at DHS, the last two heading the NCSD and US-CERT as the “Cyber Czar” of the U.S. Andy is also a Special Government Employee on the Defense Science Board Task Force on Mission Impact of Foreign Influence on DoD Software. He is also a partner with the law firm of Allenbaugh Samini Gosheh, LLP.

The Constantly Changing Threat Landscape

Andy talked with us about the changing threat landscape and lessons learned from past approaches to cyber security that can be applied in a forward looking approach to Risk Management and Compliance.

Since much of his experience has been spent preparing the country for what cyber threats are coming next, Andy thinks of IT security as a war fought in a constantly morphing theater with new technologies and vulnerabilities and new motivations and threats.

A Different Approach Moving Forward

For anyone serious about security this is a sound perspective whether you are a government agency, a major enterprise or a small business. But, the balance between open networks and services and robust security remains one of the major challenges for IT organization. Andy pointed us to lessons learned from his past, fueling a vibrant conversation during the customer and speaker roundtable. Perhaps the most important thing I heard was it’s not enough to prepare for the last war, or the last successful attack. While perimeter defense and legacy standards for network security are provide some measure of security, those measure are very often insufficient to deal with the new threats that seem to be gaining in sophistication at an accelerating pace. Andy encouraged us to focus on adopting new requirements and security infrastructure for situational awareness and control.

Greater sophistication, slower, lower-level attacks, greater knowledge about the targets (data, activity, vulnerabilities) are all contributing to the need for near-time visibility on a large-scale. This has become far more important than sub-second correlation of known attack vectors against discrete sets of network devices.

“NIST perspective: Continuing serious cyber attacks on federal information systems, large and small; targeting key federal operations and assets. Attacks are organized, disciplined, aggressive, and well resourced; many are extremely sophisticated. Adversaries are nation states, terrorist groups, criminals, hackers, and individuals or groups with intentions of compromising federal information systems.”

Andy went on to discuss how the effective deployment of malicious software causing significant exfiltration of sensitive information (including intellectual property) and potential for disruption of critical information systems/services has made detection of inforation and data leakage a key government and enterprise security requirement.

Bob Flores, Former CTO and 31 year veteran of the CIA was our next speaker. Bob retired from the CIA six months ago and is now President and CEO of Applicology, providing cyber security and IT strategy consulting services. In his 31 years at the CIA, he held various positions in the Directorate of Intelligence, Directorate of Support, and the National Clandestine Service. Most recently he was the CIA’s CTO where he was responsible for ensuring that the Agency’s technology investments matched the needs of its many missions. Bob has a Bachelor and Master of Science degrees in Statistics from Virginia Tech.

Quis custodiet ipsos custodes?

Brush up on your Latin! “Who’s guarding the guards” was the topic of Bob’s talk. Insider threat in an every changing threat landscape was and remains our number one cyber security risk.

“Defense-in-depth isn’t just about putting adequate technology in place, it’s also about paying attention to your people and implementing policies and procedures to reduce the likelihood of an insider attack.”
- Dawn Cappell, CERT

The simple but not so obvious model Bob pursued at the CIA was an extension of the ISO stack to include the non-technical but motivational additions.


We need to worry about all levels of the stack including layers eight and nine because we all have people messing around at various layers with applications, scripts, communications etc. And their motivation is often very clear.

Nemo repente fuit turpissimus! Or no one ever became thoroughly bad in one step!”

The point is people don’t just wake up one day and decide to be bad. They are motivated over time by larger causes and in EVERY CASE leave a trail of clues behind that can’t entirely be covered up.

What to Do?

According to Mr. Flores the focus needs to be on real-time visibility. You need visibility into who (or what) is perturbing your enterprise right now and over time. You can tediously review the logs of each device and user as the CIA used to do or you can take advantage of Splunk.

“Splunk may not be the best thing since sliced bread, but it’s pretty darn close.”
- Bob Flores

Why Splunk?

Why did the CIA choose Splunk over so many other security forensic solutions? It all comes down to how easily and scalable Splunk can eat any logs, events and messages Bob’s organization throws at it. Combine that with the real-time search, alert and reporting and over time statistics and analysis on

Splunk Live Princeton 2009

Wednesday and we’re at Splunk Live Princeton, NJ. What an awesome place. Princeton is home to a great university and some great culinary experiences. Check out Mediterra — an interesting mix of Italian and Spanish influences. Apparently it’s where all the Princeton parents treat their kids to dinner when they are in town. Next store to our venue was the great hope for the state of NJ — a new Governor. The current Governor has turned the state budget and tax base into toxic waste. Well things went much better for the more than 60 Splunk Live attendees in Princeton today, who gained insight into how a number of large Splunk customers keep their mission critical applications running in a time of IT budget slash and burn.

Matthew Stevens, Director Software Systems and Architecture at Comcast provides guidance to Comcast executives on mission critical media systems and strategic systems architecture. Comcast is the country’s largest provider of cable services serving 23.9 million cable customers, 15.3 million high-speed Internet customers and 7.0 million Comcast Digital Voice customers.

Comcast Developer Network

Matthew’s latest project is the Comcast Developers Network a Comcast-scale secure web services platform for the development of cool new media and entertainment offerings. The Comcast Web Platform environment generates of billions of software events each day from caching and load-balancing, origin application servers, databases, middleware and content delivery networks for images and video streams. Comcast services demand high quality. Much of the Comcast content is exclusive and premium services drive revenue. Interfaces between technology components (applications, delivery platforms) need to adhere to best practices to ensure the highest degree of end customer experience.

Why Splunk?

Comcast has acquired many system and application management platforms over the years, but nothing was providing the team with the robust information from operational telemetry the teams around the company need to ensure data integrity, stability, application quality and efficiency. Several efforts specifically drove Comcast to consider and deploy Splunk.

  • Product rollout: The team wanted the ability to predict and correct potential issues before going live into into production—Splunk has become a required best practice for new product rollouts.
  • Network/ System Integrity: Understanding security and user experience across a very large network and set of systems is a must to protect the business. Splunk provides the insight the network and system teams need across many different silos of technologies.
  • Business Intelligence: Having immediate access to real-time events and historical trends allows the various Comcast business teams to react quickly and adapt to changing customer behaviors.
  • Agility: Alerts and Dashboards indicate discrepancies so distributed teams can investigate immediately and remediate failures and attacks.

Video CDN/CMS Performance

“In content management systems and delivery networks a devil walks the long tail. If you’re facing concurrent hits across the tail of the curve, sharpen your pencil, you’ve got problems!”

Splunk helps Comcast understand the risks of instability in our systems, especially during periods of high concurrency. Through pre-production modeling of even patterns and subsequent monitoring of these patterns Splunk pays for itself by helping Comcast avoid deployment of vulnerable systems, downtime, and upset customers.

Predicting System Imbalance

Comcast has successfully used Splunk to evaluate potential infrastructure vendor’s solutions and determine if they will balance loads properly across a large, indeterminate infrastructure. Often the answer is no as illustrated here in a Splunk report of resource utilization across various services.

Splunk has also been utilized to see whether solutions will be resilient to different traffic patterns, helping the company perform predictive analysis before making critical infrastructure investments.

Load testing is performed during non peak hours and the results are analyzed for system failures over time using the telemetry data Splunk can correlated across various logs, messages and events.

When failures are found the Comcast team uses Splunk reports to dig deeper into the data.


Security and Compliance

In addition to operations use cases, Comcast security and compliance teams leverage the consolidated logs across data centers to enable faster threat assessment and security monitoring.

  • Monitoring for bad actors to trigger alerts,
  • Conducting threat detection over time,
  • Detecting attacks/vulnerabilities in systems and
  • Auditing systems in support of security assessments and compliance.

What’s Next?

Next up for Matthew and team is the launch of the Comcast CodeBig Platform enabling a network of developers to create content for the network. Some of these developers are already using Splunk in their own managed services like Mashery. Comcast is working to hook the Mashery Splunk installation to their own in-order to provide visibility across multiple services and providers of content and entertainment functionality.

Chris Abboud manages the Enterprise Systems Management team at Dow Jones — monitoring customer facing infrastructure and applications. Dow Jones provides global business news and information services to millions of consumers and enterprise media groups. Keeping these revenue generating services running 7×24x365 is the highest priority. Chris also manages the DJ service management platforms (Remedy, Knowledge Base, etc.) He’s been with the DJ organization for 10 years, in current role for 3 years.

“Our mission is to address issues before they become service impacting events. Failures are going to happen — we need to make sure people know about them as soon as possible.”

The Splunk Set-up

The Dow Jones Splunk installation includes

  • Data from 6000+ servers globally,
  • 13,500 + source types,
  • 1,700 network devices (primarily Cisco and Juniper) and
  • Ten distributed Splunk servers in difference geographies index ~100GB a day and provide a new global logging console.

Why Splunk?

Each Dow Jones command center now has the ability to know what’s happening before customers do across a wide range of internal and external services. Splunk speeds the time to resolution for email outages that may impact internal users’ productivity and editorial sites downtime that can directly impact to customer service and revenue. Dow Jones has found Splunk generates significantly fewer false positives than traditional monitoring systems and new resources are much easier to manage and deploy.

Splunk Live New York 2009

This week we’re on the East Coast enjoying some fantastic customer presentations and roundtables at Splunk Live events in New York City, Princeton NJ and Washington DC. It’s Tuesday and we have more than 100 customers and Splunk users attending Splunk Live in midtown Manhattan. The vibe is electric as we’re being treated to awesome talks by IDT and New York Life. At lunch, long-term customer’s Bloomberg and AT&T joined the customer roundtable conversation.

Gabe Arnett, Senior Software Architect at Moody’s demonstrated how Splunk is being used to monitor and troubleshoot the Moody’s Analytics platform. Gabe has more than 15 years of building web applications in financial services, investment banking and e-Commerce. At Moody’s he’s responsible for global development team that develops and supports the newly re-designed client facing website – v3.moodys.com. Moody’s is a leading provider of research, data, analytic tools and related services to debt capital markets and credit risk management professionals. The company’s products and services provide the means to assess and manage the credit risk of individual exposures as well as portfolios; price and value holdings of debt instruments; analyze macroeconomic trends; and enhance customers’ risk management skills and practices.

Moody’s Splunk environment is utilized by 25 different users and runs on Windows 2003. Splunk provides Gabe’s developers secure access to the logs they need without touching the production devices, servers and applications. His team has built custom searches and a number of dashboards indicating the general health of their applications and service. Custom searches and alerts provide alerts to track errors and access – guaranteeing good user experience. The team also uses Splunk to understand when and where new content isn’t flowing to the v3 platform. A large part of the Moody’s user experience is delivering email alerts and Splunk helps the team track GUIDs to ensure customers receive the alerts they’ve subscribed to.

The team recently migrated from Splunk 3 to Splunk 4 – taking 30 minutes to perform the upgrade. The Splunk for Windows App has been significantly revamped in Splunk 4 and the Moody’s team is making use of it to monitor through WMI local server resources (disk, memory, networking) and correlate this performance data with the Windows and Application event logs.

Shay Benjamin, CSO and SVP, Architecture at IDTdesigns and implements network architectures and manages compliance, security and fraud initiatives at IDT. IDT Corporation (www.idt.net) is a holding company focused on the telecommunications and energy industries. Since 1995 they’ve been building hundreds of VOIP switches globally and assembling an international fiber optic network. IDT pioneered VOIP (Voice over Internet Protocol) to create Net2Phone, piloted the first commercial WiFi phone service in the US and has created a prepaid calling card business, which sells 12 million calling cards a month.

IDT uses Splunk primary for VOIP Call Detail Records (CDRs). The company indexes more than 120 million CDRs per day with six mirrored Splunk server instances. Call Detail Records (CDRs) are somewhat like logs, but with many fixed delimited fields . One or more CDRs are created at each switching or routing point for every VOIP call. CDRs vary between platform devices in number of fields and contents and unlike logs, few CDR fields contain easy-to-read key=value pairs. Although a key piece of maintaining service quality, billing, monitoring network quality and security forensics, working with CDRs is labor intensive and delay wastes labor, time and money.

IDT needs fast searches across all fields of the CDRs and quick data loading – to allow fast retrieval of call data and cross platform searches to unify results from different CDR formats. Historically IDT utilized a custom RDBMS solution with an application called Call Genius. In their RDBMS IDT was forced to limit the fields that get indexed because indexing of CDRs with an RDBMS is costly as it takes up a lot of space and slows load times. The RDBMS also only indexes fields common to multiple platform’s CDRs. In the RDBMS solution much of the CDR data was put into BLOBs (actually CLOBS) – multiple CDR fields mapped into a single RDBMS field to try and achieve efficiency. But Blobs can be very difficult to search and are difficult to index effectively. The legacy Call Genius application didn’t permit the search of CDR BLOBS.

Now IDT utilizes Splunk to index all CDR fields. No need to decide what fields to index and cross platform searches are easy without losing specific platform CDR format resolution. There is no longer a need to create BLOBs for efficiency. Engineers and support staff are able to quickly search for any combination of

  • Phone Number
  • IP address
  • Trunk Group Name

Splunk naturally and easily links search terms across fields and the users just need to enter the phone number or IP and get back the CDR events and transactions.

Comparing Splunk to the RDBMS solution IDT found searches to be 50 to 100x faster on non-indexed RDBMS data. Indexed fields are also faster in Splunk than in the previous RDMBS solution. Splunk load times for a typical sample average 1 to 5 minutes versus the 20-40 minutes for the RDBMS.

IDT is in the process of feeding firewall, security, router, IP network, and switch data in into Splunk as well. They’re already discovering Splunk is finding errors not captured by Network Management Consoles and has provided valuable troubleshooting during recent datacenter migrations.

Most of all IDT is looking forward to discovering new ways to use all the data in Splunk. Heuristic analysis and Business intelligence applications are on the top of their list including the use of Splunk to find human “Family and Friends” networks and drive the development of new commercial programs.