thebaumblog: Archive for October, 2009

SplunkLive Seattle Kicks IT

On what was an incredibly beautiful day we had more than 100 Splunk devotees attend our first ever SplunkLive event in Seattle last week. In the shadow of Microsoft we talked about our Windows and Microsoft strategy and compare notes with lots of customers that are running mixed Microsoft, Linux, Solaris environments. Many of our customers with Microsoft Active Directory, Exchange and SharePoint environments are utilizing Splunk to troubleshoot problems and implement security and compliance controls in large-scale, distributed environments. But, I’m still surprised at how little Microsoft .NET we’re seeing in production large-scale applications.

Three Seattle-based customers presented their views on managing mission critical applications, IT data consolidation and Splunk.

  • T-Mobile USA
  • Blue Nile
  • Washington State University

T-Mobile USA

Sean White, Senior Engineer with T-Mobile Operations in Bellevue talked with us about their global rollout of Splunk. Sean is a member of the security engineering team charged with incident response, IDS, vulnerability scanning, anti-virus and enterprise unified logging. He graduated with a B.S. in Computer Science from University of Kansas and has a deep background in large telecom environments initially as a system administrator and webmaster, SS7 network C&C and performance, engineering and now in information security. Sean has been at T-Mobile for 4 years, prior to that at Cingular, AT&T Wireless. T-Mobile USA is the 4th largest US national provider of wireless voice, messaging, and data services to 34M subscribers with annual revenues of $17B. T-Mobile USA is the US operating entity of T-Mobile International AG, the mobile communications subsidiary of Deutsche Telekom AG (NYSE: DT). Deutsche Telekom is one of the largest telecommunications companies in the world, with nearly 120 million customers worldwide

It all started with PCI Compliance

Like many of our enterprise customers, T-Mobile started working with Splunk in one area but quickly saw the value of expanding into others. For Sean and his team, PCI Compliance was the beginning of the Splunk solution footprint, but soon everyone realized the consolidation of logs, events, messages, configurations and changes meant a whole lot more.

Beginning with proving PCI compliance, T-Mobile has very specific requirements. PCI Section 10: Track and monitor all access to network resources handling cardholder data. But in T-Mobile’s case scale was a big issue. Fulfilling PCI DSS Section 10 meant tracking 26+ in-scope applications and the ability to trace transactions from start to finish across 650+ servers running Windows, Linux and Unix varieties. It also means more than 100 individuals logging into Splunk on a daily basis as part of the process.

The Splunk Set-up

The Splunk configuration consists of

  • Pairs of forwarders set up in each of 4 geographic locations.
  • Three short term indexers + 1 short term search box.
  • Three Long-term search boxes hooked into a 32 TB NAS.
  • Centrally controlled from a single deployment server.

The current installation is indexing more than 600GB/day of data and has just passed the 10B event mark. Controlling access to all this data is critical and T-Mobile has Splunk roles set up for managers and application teams to limit access to subsets of the data. The ability to segregate data access along lines of duties is critical to prove PCI compliance.

The Business Case for a SOC

In addition to proving PCI Compliance, T-Mobile has discovered Splunk’s use for Security as well. Not long ago, a SIEM vendor would have told you IDS and firewall logs were all you need. That >=2 sources of data == correlation. Not so much.

“All the best new vulnerabilities are coming in on the application layer.”
- Sean White

Enterprise logging—visibility into all of your IT data—is absolutely critical in defending against modern blended attacks. At T-Mobile Splunk has become a primary analysis tool for deciphering what is happening to the applications, servers and devices on the network. A few saved searches and Splunk helps does real correlation.

Nothing Boring about Logs and IT Data!

PCI Compliance mandates gave T-Mobile the excuse (read funding) to start an enterprise logging initiative. Logging all security, network and application events can truly give insight needed to not only measure and report on compliance controls but also to run a more secure and effective business. PCI has also discovered that integrating the ability to ask any question of their environment and get immediate answers also provides a pile of value to the help desk operations and better business intelligence functions.

“All the information about your company is in your logs—there’s nothing boring about it.”


Blue Nile

Jerry Brennock, Director Core Development at Blue Nile explained how the company is using Splunk to improve the experience of buying diamonds over the Web. Blue Nile, Inc. is an online retailer of diamonds and fine jewelry offering in-depth educational materials and unique online tools that place consumers in control of the jewelry shopping process. Importantly, the focus is on giving customers a great experience at a a great price – this translates to requiring high quality at a low cost. Jerry’s team team builds and support the infrastructure and applications for merchandising and marketing, including the website. He’s been with Blue Nile for 10 years and in the e-commerce space for more than 17.

The Killer Diamond App

Diamond Search is undoubtedly the killer application for Blue Nile’s E-commerce experience. It’s an asynchronous javascript app that has to work across any browser and there are many non-obvious use cases. All three of these factors means it is prone to failure in lots of edge cases.

“If this application isn’t fast and accurate, we don’t sell diamonds.”
- Jerry Brennock

Jerry’s team has embedded tracking pixels with name value pairs to track JavaScript profile information from each diamond search. This together with Web server 500 and 404 errors give the development, operations and customer support teams all the data they need to troubleshoot problems. The challenge is finding customer problems “in the moment” before the sale is lost.

Social Documentation Benefits and Pitfalls

Tim Jones of Agora Games posted a good summary of his experience with Splunk. Tim reveals what we’ve known for some time. Splunk is incredibly flexible and powerful but sometimes finding the Splunk documentation to do exactly what you want isn’t as easy as it should be.

We’ve struggled over the years to keeping our documentation both up to date and easy to use. Earlier this year we moved to a wiki based approach to Splunk documentation in hopes of keeping it more up to date and usable with inter-documentation links. Suffice to say we are still embryonic in our use of wiki technology as applied to documentation. We power our docs site with MediaWiki the PHP wiki technology that runs Wikipedia. Along the way we’ve had to add a lot of capability around the MediaWiki platform to control docs permissions and versioning.

If you sign-up as a Splunk Community member you can modify and add to the Splunk Knowledgebase and docs wiki yourself including:

  • edit discussion tabs
  • edit any page except for major landing pages and
  • add new pages.

We’re taking this “extended community approach” to documentation because we know there are many people like Tim that have a the ability to help us make not just the Splunk download and bits better, but also the Splunk documentation better and more complete. We realize the risk in opening up our documentation to the community is that things won’t always be as easy to find as they should. But we believe in the long run this social approach to documentation will ultimately make Splunk a much better experience.

Please let us know what your think and how we can improve.

Happy Splunking

Splunk Live Taipei Breaks All Records

More than 300 people attended Splunk Live Taipei last week and our partners at Systex hosted an incredible show of Splunk use cases, customer speakers and hands-on labs. The Systex Splunk Lab provided attendees with the opportunity to use Splunk with CICS and IBM System z mainframe data, Windows, servers and desktops, Unix and Linux, customer service operations environments, telco provisioning environments and more.

I’ll be posting separately on the hands on the Systex Splunk Lab.



Our first guest customer speaker was Yi-Lang Tsai(蔡一郎) the Taiwan Chapter Chief Security Officer of the Global Honeynet Project and the Division Manager of the National Center for High-performance Computing, a Honeynet Project sponsor. Yi-Lang is also a freelance writer with more than 30 books published on operating systems, network and system security and IT management. He presented the very important botnet work Honeynet Project is doing and showed how his team is using Splunk to deepen their research and expose what they find to the Honeynet audience of security professionals worldwide.

What is Honeynet?

The mission of the Honeynet Project is to learn the tools, tactics, and motives of the blackhat community, and share the lessons learned. Honeynet is an all volunteer organization of security professionals around the world dedicated to researching cyber threats by deploying networks to be hacked. The goals are

  • Awareness: to raise awareness of threats that exist,
  • Information: for those already aware, tech and information about threats and
  • Research: To give organizations the capabilities to learn more on their own.

Honeynet is completely open source and all of the work, research and findings are share. Everything captured is happening in the wild (there is no theory). The organization has no agenda, no employees and no product or service to sell.

Honey is simply a “high-interation” honeypot attracking any and all cyber threats and attacks. It is architecture, not a product or software that gets populated with live systems donated and run by the various Honeynet chapters globally.

Once the Honeynet is compromised, data is collected, correlated and analyzed to learn the tools, tactics, and motives of the blackhat community. Specific benefits to the global community of security professionals are the

Research : Identifying new tools and new tactics,
  • Profiling: Generating and maintaining lists of blackhats,
  • Protection: Early detection, warning and prediction,
  • Response: Forensics and incident response and
  • Self-defense.

    Taiwan Honeynet Chapter’s Environment

    Yi-Lang’s environment at the Taiwan National Center for High Performance Computing disitribuytes Honeynet/Honeypots to the Taiwan Education Network, Taiwan Chapter members and the GDH project. The environment makes heavy use of virtualization in its deployment, you might call it a “Virtual Machine Honeynet.” Its running on an advanced blade server with 128GB of memory running VMware ESX. The blade server uses either SAS OR SSD storage. More than 200 Windows 2K/2K3, Windows XP/Vista/7, Linux and FreeBSD servers run in high and low interaction honeypots.

    The Taiwan Honeynet deployment is distributed across four different data centers in different geographies Taipei, Hsinchu, Taichung and Tainan. This distributed topology allows the honeypot to have a broad reaching capture network and makes use of idle network and CPU. This large-scale Honeynet deployment supports:

    • Malware Collection and Analysis
    • Honey-Driven Botnet Detection
    • Client -Side Attack
    • Malicious Web Server Exploring
    • RFI Scripts Detection
    • Fast-Flux Domain Service Tracking
    • Research Alliance
    • Distributed Search and Analysis on Honeynet Data

    Why Splunk?

    The Taiwan Honeynet teams uses Splunk to collect and manage information from the distributed Honeynet infrastructure including GBs of logs, 400k+ connections, 2GB+ of traffic flows and tools events and metrics.


    http://blogs.splunk.com/thebaum/wp-content/uploads/2009/10/allindexdata.png

    Data analysis is performed against a variety of pivot points that are automatically extracted from the Honeynet data sources. Date & Time, Malware Source IP address, Destination IP, Protocols, Files name and Malware MD5 are some of the main fields Splunk identifies and provides to the team for deeper analysis. In addition to Splunk searches and reports the team has built custom geo-dashboards with high resolution displays by tapping into the Splunk API.

    This interactive geo-view provides the team Botnet detection, malware presence, Honeynet traffic flows and an instant status report all from one location.

    Yong Sweah Liang (Linus), VP, Head of Infrastructure and Technology for Infocomm Asia Holdings Pte Ltd (IAHGames) was our second customer speaker.

    IAH is an online game company operating some major properties including:

    • EA SPORTS™ FIFA Online 2
    • Granado Espada
    • Dragonica
    • Distribution of Box products
    • BioShock®
    • Grand Theft Auto IV