thebaumblog: Splunk Live

Cisco CSIRT Presents at SplunkLive Raleigh

Last Thursday Dave Schwartzburg and a few other Cisco security mavens attended SplunkLive Raleigh. The Cisco Computer Security Investigation Team (CSIRT) has been a applying Splunk to corporate security investigations for more than two years now and Dave was generous enough to share their experiences with us all. Joining Cisco presenting at the event was James Ervin of University of North Carolina Chapel Hill, a very knowledgeable Splunk customer. Patrick Ogden, Splunk Sales Engineer gave a rocking good demo of transaction tracing in a telco provisioning environment and Will Hayes, Splunk Sr. Solution Architect showed the latest Splunk for Cisco Security App being developed together with the Cisco CSIRT team.

Cisco CSIRT Team

Dave Schwartzburg

Dave Schwartzburg is an Information Security Investigator and runs the IDS infrastructure for Cisco Corporate and their internal networks and IT assets. He has an M.S. Information Security from East Carolina University and a B.S from the University of Wisconsin. Dave’s been with the Cisco CSIRT team for two years and prior to that was with AT&T Internet Investigations & Security Services. Cisco has more than 100,000 employees and contractors and more than 127,000 devices on their corporate network. That’s a lot to keep track of which is why the CSIRT team utilizes Splunk.

The Cisco CSIRT works to reduce the risk of loss as a result of security incidents for Cisco-owned businesses. CSIRT regularly engages in proactive threat assessment, mitigation planning, incident trending with analysis, security architecture, incident detection and response. This happens in three phases, investigations, mitigations and prevention.

A Tier 1 Event Analysis Group is located in Costa Rica. They handle security threat monitoring. The Tier 2 Event Analysis Group in Bangalore handles the easier case investigations and mitigations. Dave is part of the Tier 3 Global Incident Response Team handling more difficult cases and longer term prevention through changes to the infrastructure and security systems.

Cisco Security Environment

Cisco regularly collects web proxy (Ironport WSA), anti-virus (Ironport ESA), host-based intrusion protection (Cisco Security Agent), syslog, VPN logs, authentication messages, network IDS signatures and Netflow records from critical subnets.

  • 3 million IDS events per day
  • 3-5 billion Netflow records per day
  • 300 malware-related cases a day

Some event sources send their data to a global network of collection servers and some event types are pulled from their sources directly to a centralized server. Splunk handles the collection and indexing of the data.

Correlation and Reporting with Splunk

The CSIRT team makes extensive use of scheduled reporting and alerting for proactive monitoring of problems.

In this example, the team is correlating host-based IDS with antivirus logs and running malware reports via cron, using the Splunk CLI. The results of the report are scheduled and E-mailed to EA teams for processing and submission for remediation.

“Red Carpet Reports” monitor executive systems to make sure they aren’t infected or compromised. Here we see an example of the Koobface worm found in CSA logs on an executive laptop.

Finally the team has some way to make use of all the CSA data they receive. One of the most useful has been to pinpoint people disabling Cisco Security Agent itself indicating the machine is now unmanaged.

Results for the Security Team

The resulting productivity from centralized access to multiple data sources has been dramatic. Not only is the team lowering the time to respond to incidents, but they are also allowing lower skilled workers to handle more complex cases.. And surprisingly 10% of cases are no from previously unused/underutilized sources. The value of substantially faster access to important data and correlation across numerous sources for reporting and ad-hoc investigations is incredible.

Splunk for Cisco Security App

Some event sources send their data to a global network of collection servers and some event types are pulled from their sources directly to a centralized server. Splunk handles the collection and indexing of the data.

University of North Carolina Chapel Hill

James Ervin

James has been a doing system administration, network and security monitoring and application development with UNC since 1998 when he completed his MS in Computer Science NC State University. As part of the Information Technology Services (ITS) team at UNC his projects have included work on the university’s original Active Directory deployment, Unix-based webmail systems and security and information event monitoring. Earlier this year he inherited a centralized logging project for the university. UNC was the nation’s first state university, serving North Carolina for more than 2 centuries with 29,000 students and 4,000+ Faculty members. ITS is the largest IT organization on campus (~500 employees) looking after financials, admissions, centralized learning and centralized email. ITS frequently collaborates with other campus IT organizations of which there are many.

ITS Environment

The ITS team manages a moderate size mixed application, server and networking environment consisting of the following major components.

  • Multiple Unix flavors (AIX, RHEL, Solaris)
  • Large Windows infrastructure
  • ~600 devices total
  • ~20 IPS/IDS/FW/LB devices
  • PDU, environment probe data
  • Apache, Tomcat, JBoss

This environment is constantly in flux as students and faculty come and go and non-managed desktops, laptops and mobile devices connect to the network.

“We needed to determine what is possible within our environment and adopt a flexible architecture.”
- James Ervin

Earlier this year, James and his team were facing an every growing list of requirements for their centralized log management project including:

  • Make syslog services more useful to the rest of the IT organizations
  • Collect and centralize Windows event logs
  • Alert on events of interest
  • Correlate security events
  • Provide NOC/SOC staff access to security logs
  • Give application developers access to application logs
  • Report on unplanned system changes
  • Satisfy the auditors

SplunkLive Seattle Kicks IT

On what was an incredibly beautiful day we had more than 100 Splunk devotees attend our first ever SplunkLive event in Seattle last week. In the shadow of Microsoft we talked about our Windows and Microsoft strategy and compare notes with lots of customers that are running mixed Microsoft, Linux, Solaris environments. Many of our customers with Microsoft Active Directory, Exchange and SharePoint environments are utilizing Splunk to troubleshoot problems and implement security and compliance controls in large-scale, distributed environments. But, I’m still surprised at how little Microsoft .NET we’re seeing in production large-scale applications.

Three Seattle-based customers presented their views on managing mission critical applications, IT data consolidation and Splunk.

  • T-Mobile USA
  • Blue Nile
  • Washington State University

T-Mobile USA

Sean White, Senior Engineer with T-Mobile Operations in Bellevue talked with us about their global rollout of Splunk. Sean is a member of the security engineering team charged with incident response, IDS, vulnerability scanning, anti-virus and enterprise unified logging. He graduated with a B.S. in Computer Science from University of Kansas and has a deep background in large telecom environments initially as a system administrator and webmaster, SS7 network C&C and performance, engineering and now in information security. Sean has been at T-Mobile for 4 years, prior to that at Cingular, AT&T Wireless. T-Mobile USA is the 4th largest US national provider of wireless voice, messaging, and data services to 34M subscribers with annual revenues of $17B. T-Mobile USA is the US operating entity of T-Mobile International AG, the mobile communications subsidiary of Deutsche Telekom AG (NYSE: DT). Deutsche Telekom is one of the largest telecommunications companies in the world, with nearly 120 million customers worldwide

It all started with PCI Compliance

Like many of our enterprise customers, T-Mobile started working with Splunk in one area but quickly saw the value of expanding into others. For Sean and his team, PCI Compliance was the beginning of the Splunk solution footprint, but soon everyone realized the consolidation of logs, events, messages, configurations and changes meant a whole lot more.

Beginning with proving PCI compliance, T-Mobile has very specific requirements. PCI Section 10: Track and monitor all access to network resources handling cardholder data. But in T-Mobile’s case scale was a big issue. Fulfilling PCI DSS Section 10 meant tracking 26+ in-scope applications and the ability to trace transactions from start to finish across 650+ servers running Windows, Linux and Unix varieties. It also means more than 100 individuals logging into Splunk on a daily basis as part of the process.

The Splunk Set-up

The Splunk configuration consists of

  • Pairs of forwarders set up in each of 4 geographic locations.
  • Three short term indexers + 1 short term search box.
  • Three Long-term search boxes hooked into a 32 TB NAS.
  • Centrally controlled from a single deployment server.

The current installation is indexing more than 600GB/day of data and has just passed the 10B event mark. Controlling access to all this data is critical and T-Mobile has Splunk roles set up for managers and application teams to limit access to subsets of the data. The ability to segregate data access along lines of duties is critical to prove PCI compliance.

The Business Case for a SOC

In addition to proving PCI Compliance, T-Mobile has discovered Splunk’s use for Security as well. Not long ago, a SIEM vendor would have told you IDS and firewall logs were all you need. That >=2 sources of data == correlation. Not so much.

“All the best new vulnerabilities are coming in on the application layer.”
- Sean White

Enterprise logging—visibility into all of your IT data—is absolutely critical in defending against modern blended attacks. At T-Mobile Splunk has become a primary analysis tool for deciphering what is happening to the applications, servers and devices on the network. A few saved searches and Splunk helps does real correlation.

Nothing Boring about Logs and IT Data!

PCI Compliance mandates gave T-Mobile the excuse (read funding) to start an enterprise logging initiative. Logging all security, network and application events can truly give insight needed to not only measure and report on compliance controls but also to run a more secure and effective business. PCI has also discovered that integrating the ability to ask any question of their environment and get immediate answers also provides a pile of value to the help desk operations and better business intelligence functions.

“All the information about your company is in your logs—there’s nothing boring about it.”


Blue Nile

Jerry Brennock, Director Core Development at Blue Nile explained how the company is using Splunk to improve the experience of buying diamonds over the Web. Blue Nile, Inc. is an online retailer of diamonds and fine jewelry offering in-depth educational materials and unique online tools that place consumers in control of the jewelry shopping process. Importantly, the focus is on giving customers a great experience at a a great price – this translates to requiring high quality at a low cost. Jerry’s team team builds and support the infrastructure and applications for merchandising and marketing, including the website. He’s been with Blue Nile for 10 years and in the e-commerce space for more than 17.

The Killer Diamond App

Diamond Search is undoubtedly the killer application for Blue Nile’s E-commerce experience. It’s an asynchronous javascript app that has to work across any browser and there are many non-obvious use cases. All three of these factors means it is prone to failure in lots of edge cases.

“If this application isn’t fast and accurate, we don’t sell diamonds.”
- Jerry Brennock

Jerry’s team has embedded tracking pixels with name value pairs to track JavaScript profile information from each diamond search. This together with Web server 500 and 404 errors give the development, operations and customer support teams all the data they need to troubleshoot problems. The challenge is finding customer problems “in the moment” before the sale is lost.

Splunk Live Taipei Breaks All Records

More than 300 people attended Splunk Live Taipei last week and our partners at Systex hosted an incredible show of Splunk use cases, customer speakers and hands-on labs. The Systex Splunk Lab provided attendees with the opportunity to use Splunk with CICS and IBM System z mainframe data, Windows, servers and desktops, Unix and Linux, customer service operations environments, telco provisioning environments and more.

I’ll be posting separately on the hands on the Systex Splunk Lab.



Our first guest customer speaker was Yi-Lang Tsai(蔡一郎) the Taiwan Chapter Chief Security Officer of the Global Honeynet Project and the Division Manager of the National Center for High-performance Computing, a Honeynet Project sponsor. Yi-Lang is also a freelance writer with more than 30 books published on operating systems, network and system security and IT management. He presented the very important botnet work Honeynet Project is doing and showed how his team is using Splunk to deepen their research and expose what they find to the Honeynet audience of security professionals worldwide.

What is Honeynet?

The mission of the Honeynet Project is to learn the tools, tactics, and motives of the blackhat community, and share the lessons learned. Honeynet is an all volunteer organization of security professionals around the world dedicated to researching cyber threats by deploying networks to be hacked. The goals are

  • Awareness: to raise awareness of threats that exist,
  • Information: for those already aware, tech and information about threats and
  • Research: To give organizations the capabilities to learn more on their own.

Honeynet is completely open source and all of the work, research and findings are share. Everything captured is happening in the wild (there is no theory). The organization has no agenda, no employees and no product or service to sell.

Honey is simply a “high-interation” honeypot attracking any and all cyber threats and attacks. It is architecture, not a product or software that gets populated with live systems donated and run by the various Honeynet chapters globally.

Once the Honeynet is compromised, data is collected, correlated and analyzed to learn the tools, tactics, and motives of the blackhat community. Specific benefits to the global community of security professionals are the

Research : Identifying new tools and new tactics,
  • Profiling: Generating and maintaining lists of blackhats,
  • Protection: Early detection, warning and prediction,
  • Response: Forensics and incident response and
  • Self-defense.

    Taiwan Honeynet Chapter’s Environment

    Yi-Lang’s environment at the Taiwan National Center for High Performance Computing disitribuytes Honeynet/Honeypots to the Taiwan Education Network, Taiwan Chapter members and the GDH project. The environment makes heavy use of virtualization in its deployment, you might call it a “Virtual Machine Honeynet.” Its running on an advanced blade server with 128GB of memory running VMware ESX. The blade server uses either SAS OR SSD storage. More than 200 Windows 2K/2K3, Windows XP/Vista/7, Linux and FreeBSD servers run in high and low interaction honeypots.

    The Taiwan Honeynet deployment is distributed across four different data centers in different geographies Taipei, Hsinchu, Taichung and Tainan. This distributed topology allows the honeypot to have a broad reaching capture network and makes use of idle network and CPU. This large-scale Honeynet deployment supports:

    • Malware Collection and Analysis
    • Honey-Driven Botnet Detection
    • Client -Side Attack
    • Malicious Web Server Exploring
    • RFI Scripts Detection
    • Fast-Flux Domain Service Tracking
    • Research Alliance
    • Distributed Search and Analysis on Honeynet Data

    Why Splunk?

    The Taiwan Honeynet teams uses Splunk to collect and manage information from the distributed Honeynet infrastructure including GBs of logs, 400k+ connections, 2GB+ of traffic flows and tools events and metrics.


    http://blogs.splunk.com/thebaum/wp-content/uploads/2009/10/allindexdata.png

    Data analysis is performed against a variety of pivot points that are automatically extracted from the Honeynet data sources. Date & Time, Malware Source IP address, Destination IP, Protocols, Files name and Malware MD5 are some of the main fields Splunk identifies and provides to the team for deeper analysis. In addition to Splunk searches and reports the team has built custom geo-dashboards with high resolution displays by tapping into the Splunk API.

    This interactive geo-view provides the team Botnet detection, malware presence, Honeynet traffic flows and an instant status report all from one location.

    Yong Sweah Liang (Linus), VP, Head of Infrastructure and Technology for Infocomm Asia Holdings Pte Ltd (IAHGames) was our second customer speaker.

    IAH is an online game company operating some major properties including:

    • EA SPORTS™ FIFA Online 2
    • Granado Espada
    • Dragonica
    • Distribution of Box products
    • BioShock®
    • Grand Theft Auto IV



  • Splunk Live Washington DC 2009

    Obama-nomics is highly visible in our nation’s capitol these days. The DC economy is humming as our tax dollars are hard at working fueling all kinds of government spending.With more than 100 attendees at Splunk Live on Thursday we certainly were not disappointed in our quest to help make all this growth in government more efficient! Managing large networks and security forensics were the hot topics of conversation at Splunk Live Washington, DC where everyone was treated to a trio of three incredible speakers.

    Our first speaker was Andy Purdy, the Co-Director, International Cyber Center, George Mason University and the Former Acting Director, National Cyber Security Division (NCSD) and US-CERT Department of Homeland Security. Andy was a member of the White House staff team that drafted the U.S. National Strategy to Secure Cyberspace (2003) and served on DHS tiger team that formed the National Cyber Security Division (NCSD). He was 3 1/2 years at DHS, the last two heading the NCSD and US-CERT as the “Cyber Czar” of the U.S. Andy is also a Special Government Employee on the Defense Science Board Task Force on Mission Impact of Foreign Influence on DoD Software. He is also a partner with the law firm of Allenbaugh Samini Gosheh, LLP.

    The Constantly Changing Threat Landscape

    Andy talked with us about the changing threat landscape and lessons learned from past approaches to cyber security that can be applied in a forward looking approach to Risk Management and Compliance.

    Since much of his experience has been spent preparing the country for what cyber threats are coming next, Andy thinks of IT security as a war fought in a constantly morphing theater with new technologies and vulnerabilities and new motivations and threats.

    A Different Approach Moving Forward

    For anyone serious about security this is a sound perspective whether you are a government agency, a major enterprise or a small business. But, the balance between open networks and services and robust security remains one of the major challenges for IT organization. Andy pointed us to lessons learned from his past, fueling a vibrant conversation during the customer and speaker roundtable. Perhaps the most important thing I heard was it’s not enough to prepare for the last war, or the last successful attack. While perimeter defense and legacy standards for network security are provide some measure of security, those measure are very often insufficient to deal with the new threats that seem to be gaining in sophistication at an accelerating pace. Andy encouraged us to focus on adopting new requirements and security infrastructure for situational awareness and control.

    Greater sophistication, slower, lower-level attacks, greater knowledge about the targets (data, activity, vulnerabilities) are all contributing to the need for near-time visibility on a large-scale. This has become far more important than sub-second correlation of known attack vectors against discrete sets of network devices.

    “NIST perspective: Continuing serious cyber attacks on federal information systems, large and small; targeting key federal operations and assets. Attacks are organized, disciplined, aggressive, and well resourced; many are extremely sophisticated. Adversaries are nation states, terrorist groups, criminals, hackers, and individuals or groups with intentions of compromising federal information systems.”

    Andy went on to discuss how the effective deployment of malicious software causing significant exfiltration of sensitive information (including intellectual property) and potential for disruption of critical information systems/services has made detection of inforation and data leakage a key government and enterprise security requirement.

    Bob Flores, Former CTO and 31 year veteran of the CIA was our next speaker. Bob retired from the CIA six months ago and is now President and CEO of Applicology, providing cyber security and IT strategy consulting services. In his 31 years at the CIA, he held various positions in the Directorate of Intelligence, Directorate of Support, and the National Clandestine Service. Most recently he was the CIA’s CTO where he was responsible for ensuring that the Agency’s technology investments matched the needs of its many missions. Bob has a Bachelor and Master of Science degrees in Statistics from Virginia Tech.

    Quis custodiet ipsos custodes?

    Brush up on your Latin! “Who’s guarding the guards” was the topic of Bob’s talk. Insider threat in an every changing threat landscape was and remains our number one cyber security risk.

    “Defense-in-depth isn’t just about putting adequate technology in place, it’s also about paying attention to your people and implementing policies and procedures to reduce the likelihood of an insider attack.”
    - Dawn Cappell, CERT

    The simple but not so obvious model Bob pursued at the CIA was an extension of the ISO stack to include the non-technical but motivational additions.


    We need to worry about all levels of the stack including layers eight and nine because we all have people messing around at various layers with applications, scripts, communications etc. And their motivation is often very clear.

    Nemo repente fuit turpissimus! Or no one ever became thoroughly bad in one step!”

    The point is people don’t just wake up one day and decide to be bad. They are motivated over time by larger causes and in EVERY CASE leave a trail of clues behind that can’t entirely be covered up.

    What to Do?

    According to Mr. Flores the focus needs to be on real-time visibility. You need visibility into who (or what) is perturbing your enterprise right now and over time. You can tediously review the logs of each device and user as the CIA used to do or you can take advantage of Splunk.

    “Splunk may not be the best thing since sliced bread, but it’s pretty darn close.”
    - Bob Flores

    Why Splunk?

    Why did the CIA choose Splunk over so many other security forensic solutions? It all comes down to how easily and scalable Splunk can eat any logs, events and messages Bob’s organization throws at it. Combine that with the real-time search, alert and reporting and over time statistics and analysis on