thebaumblog: Compliance

Cisco CSIRT Presents at SplunkLive Raleigh

Last Thursday Dave Schwartzburg and a few other Cisco security mavens attended SplunkLive Raleigh. The Cisco Computer Security Investigation Team (CSIRT) has been a applying Splunk to corporate security investigations for more than two years now and Dave was generous enough to share their experiences with us all. Joining Cisco presenting at the event was James Ervin of University of North Carolina Chapel Hill, a very knowledgeable Splunk customer. Patrick Ogden, Splunk Sales Engineer gave a rocking good demo of transaction tracing in a telco provisioning environment and Will Hayes, Splunk Sr. Solution Architect showed the latest Splunk for Cisco Security App being developed together with the Cisco CSIRT team.

Cisco CSIRT Team

Dave Schwartzburg

Dave Schwartzburg is an Information Security Investigator and runs the IDS infrastructure for Cisco Corporate and their internal networks and IT assets. He has an M.S. Information Security from East Carolina University and a B.S from the University of Wisconsin. Dave’s been with the Cisco CSIRT team for two years and prior to that was with AT&T Internet Investigations & Security Services. Cisco has more than 100,000 employees and contractors and more than 127,000 devices on their corporate network. That’s a lot to keep track of which is why the CSIRT team utilizes Splunk.

The Cisco CSIRT works to reduce the risk of loss as a result of security incidents for Cisco-owned businesses. CSIRT regularly engages in proactive threat assessment, mitigation planning, incident trending with analysis, security architecture, incident detection and response. This happens in three phases, investigations, mitigations and prevention.

A Tier 1 Event Analysis Group is located in Costa Rica. They handle security threat monitoring. The Tier 2 Event Analysis Group in Bangalore handles the easier case investigations and mitigations. Dave is part of the Tier 3 Global Incident Response Team handling more difficult cases and longer term prevention through changes to the infrastructure and security systems.

Cisco Security Environment

Cisco regularly collects web proxy (Ironport WSA), anti-virus (Ironport ESA), host-based intrusion protection (Cisco Security Agent), syslog, VPN logs, authentication messages, network IDS signatures and Netflow records from critical subnets.

  • 3 million IDS events per day
  • 3-5 billion Netflow records per day
  • 300 malware-related cases a day

Some event sources send their data to a global network of collection servers and some event types are pulled from their sources directly to a centralized server. Splunk handles the collection and indexing of the data.

Correlation and Reporting with Splunk

The CSIRT team makes extensive use of scheduled reporting and alerting for proactive monitoring of problems.

In this example, the team is correlating host-based IDS with antivirus logs and running malware reports via cron, using the Splunk CLI. The results of the report are scheduled and E-mailed to EA teams for processing and submission for remediation.

“Red Carpet Reports” monitor executive systems to make sure they aren’t infected or compromised. Here we see an example of the Koobface worm found in CSA logs on an executive laptop.

Finally the team has some way to make use of all the CSA data they receive. One of the most useful has been to pinpoint people disabling Cisco Security Agent itself indicating the machine is now unmanaged.

Results for the Security Team

The resulting productivity from centralized access to multiple data sources has been dramatic. Not only is the team lowering the time to respond to incidents, but they are also allowing lower skilled workers to handle more complex cases.. And surprisingly 10% of cases are no from previously unused/underutilized sources. The value of substantially faster access to important data and correlation across numerous sources for reporting and ad-hoc investigations is incredible.

Splunk for Cisco Security App

Some event sources send their data to a global network of collection servers and some event types are pulled from their sources directly to a centralized server. Splunk handles the collection and indexing of the data.

University of North Carolina Chapel Hill

James Ervin

James has been a doing system administration, network and security monitoring and application development with UNC since 1998 when he completed his MS in Computer Science NC State University. As part of the Information Technology Services (ITS) team at UNC his projects have included work on the university’s original Active Directory deployment, Unix-based webmail systems and security and information event monitoring. Earlier this year he inherited a centralized logging project for the university. UNC was the nation’s first state university, serving North Carolina for more than 2 centuries with 29,000 students and 4,000+ Faculty members. ITS is the largest IT organization on campus (~500 employees) looking after financials, admissions, centralized learning and centralized email. ITS frequently collaborates with other campus IT organizations of which there are many.

ITS Environment

The ITS team manages a moderate size mixed application, server and networking environment consisting of the following major components.

  • Multiple Unix flavors (AIX, RHEL, Solaris)
  • Large Windows infrastructure
  • ~600 devices total
  • ~20 IPS/IDS/FW/LB devices
  • PDU, environment probe data
  • Apache, Tomcat, JBoss

This environment is constantly in flux as students and faculty come and go and non-managed desktops, laptops and mobile devices connect to the network.

“We needed to determine what is possible within our environment and adopt a flexible architecture.”
- James Ervin

Earlier this year, James and his team were facing an every growing list of requirements for their centralized log management project including:

  • Make syslog services more useful to the rest of the IT organizations
  • Collect and centralize Windows event logs
  • Alert on events of interest
  • Correlate security events
  • Provide NOC/SOC staff access to security logs
  • Give application developers access to application logs
  • Report on unplanned system changes
  • Satisfy the auditors

Splunk Live New York 2009

This week we’re on the East Coast enjoying some fantastic customer presentations and roundtables at Splunk Live events in New York City, Princeton NJ and Washington DC. It’s Tuesday and we have more than 100 customers and Splunk users attending Splunk Live in midtown Manhattan. The vibe is electric as we’re being treated to awesome talks by IDT and New York Life. At lunch, long-term customer’s Bloomberg and AT&T joined the customer roundtable conversation.

Gabe Arnett, Senior Software Architect at Moody’s demonstrated how Splunk is being used to monitor and troubleshoot the Moody’s Analytics platform. Gabe has more than 15 years of building web applications in financial services, investment banking and e-Commerce. At Moody’s he’s responsible for global development team that develops and supports the newly re-designed client facing website – v3.moodys.com. Moody’s is a leading provider of research, data, analytic tools and related services to debt capital markets and credit risk management professionals. The company’s products and services provide the means to assess and manage the credit risk of individual exposures as well as portfolios; price and value holdings of debt instruments; analyze macroeconomic trends; and enhance customers’ risk management skills and practices.

Moody’s Splunk environment is utilized by 25 different users and runs on Windows 2003. Splunk provides Gabe’s developers secure access to the logs they need without touching the production devices, servers and applications. His team has built custom searches and a number of dashboards indicating the general health of their applications and service. Custom searches and alerts provide alerts to track errors and access – guaranteeing good user experience. The team also uses Splunk to understand when and where new content isn’t flowing to the v3 platform. A large part of the Moody’s user experience is delivering email alerts and Splunk helps the team track GUIDs to ensure customers receive the alerts they’ve subscribed to.

The team recently migrated from Splunk 3 to Splunk 4 – taking 30 minutes to perform the upgrade. The Splunk for Windows App has been significantly revamped in Splunk 4 and the Moody’s team is making use of it to monitor through WMI local server resources (disk, memory, networking) and correlate this performance data with the Windows and Application event logs.

Shay Benjamin, CSO and SVP, Architecture at IDTdesigns and implements network architectures and manages compliance, security and fraud initiatives at IDT. IDT Corporation (www.idt.net) is a holding company focused on the telecommunications and energy industries. Since 1995 they’ve been building hundreds of VOIP switches globally and assembling an international fiber optic network. IDT pioneered VOIP (Voice over Internet Protocol) to create Net2Phone, piloted the first commercial WiFi phone service in the US and has created a prepaid calling card business, which sells 12 million calling cards a month.

IDT uses Splunk primary for VOIP Call Detail Records (CDRs). The company indexes more than 120 million CDRs per day with six mirrored Splunk server instances. Call Detail Records (CDRs) are somewhat like logs, but with many fixed delimited fields . One or more CDRs are created at each switching or routing point for every VOIP call. CDRs vary between platform devices in number of fields and contents and unlike logs, few CDR fields contain easy-to-read key=value pairs. Although a key piece of maintaining service quality, billing, monitoring network quality and security forensics, working with CDRs is labor intensive and delay wastes labor, time and money.

IDT needs fast searches across all fields of the CDRs and quick data loading – to allow fast retrieval of call data and cross platform searches to unify results from different CDR formats. Historically IDT utilized a custom RDBMS solution with an application called Call Genius. In their RDBMS IDT was forced to limit the fields that get indexed because indexing of CDRs with an RDBMS is costly as it takes up a lot of space and slows load times. The RDBMS also only indexes fields common to multiple platform’s CDRs. In the RDBMS solution much of the CDR data was put into BLOBs (actually CLOBS) – multiple CDR fields mapped into a single RDBMS field to try and achieve efficiency. But Blobs can be very difficult to search and are difficult to index effectively. The legacy Call Genius application didn’t permit the search of CDR BLOBS.

Now IDT utilizes Splunk to index all CDR fields. No need to decide what fields to index and cross platform searches are easy without losing specific platform CDR format resolution. There is no longer a need to create BLOBs for efficiency. Engineers and support staff are able to quickly search for any combination of

  • Phone Number
  • IP address
  • Trunk Group Name

Splunk naturally and easily links search terms across fields and the users just need to enter the phone number or IP and get back the CDR events and transactions.

Comparing Splunk to the RDBMS solution IDT found searches to be 50 to 100x faster on non-indexed RDBMS data. Indexed fields are also faster in Splunk than in the previous RDMBS solution. Splunk load times for a typical sample average 1 to 5 minutes versus the 20-40 minutes for the RDBMS.

IDT is in the process of feeding firewall, security, router, IP network, and switch data in into Splunk as well. They’re already discovering Splunk is finding errors not captured by Network Management Consoles and has provided valuable troubleshooting during recent datacenter migrations.

Most of all IDT is looking forward to discovering new ways to use all the data in Splunk. Heuristic analysis and Business intelligence applications are on the top of their list including the use of Splunk to find human “Family and Friends” networks and drive the development of new commercial programs.

Splunk 4 Lands in the Southwest

Last week we continued our road show launching Splunk 4 through the Southwestern US in Phoenix, San Diego and Los Angeles.This was our second annual gathering of customers, partners and users and we had more than double the attendees at this year’s Splunk Live events. In the morning we held a three-hour hands on technical workshop. Attendees had the opportunity to install and configure Splunk 4 on their laptops or remote server and get one-on-one assistance from the Splunk team. Afternoon sessions and dinner focused on customer presentations. We’re very grateful to all the presenters who took time out of their busy days to share with everyone how Splunk is transforming their IT environments. I captured some notes from the week and thought I’d share them with you.

Early Warning

In Phoenix we had a packed house at the Sanctuary conference center on the side of Camel Back Mountain. At 109 degrees I decided against hiking up it in the early AM. Dave Bridgeman, Data Security Engineer at Early Warning kept things cool showing the audience how his company’s use of Splunk in their security operations center. Early Warning collaborates with major financial services companies to facilitate fraud detection through shared information and knowledge in cross-institution environments. The company has an interesting history having spun out of First Data and is now primarily owned by Bank of America, BB&T, JPMorgan Chase and Wells Fargo.

Dave is a well rounded IT professional who started as a developer then moved into network and security management. He current leads the data security team for Early Warning. The environment he over sees includes a variety of platforms including AS400s, MP300s, AIX, Solaris, Linux and Windows. He uses a combination of Splunk forwarders and syslog forwarders to collect Java and Cobol application logs and FTP/SFTP networking logs.

The Early Warning Splunk installation is designed to track transactions and users from one bank to the next in cross-institution activities. Transaction ID tracing correlates events across applications and services and Splunk alerts the team when jobs fail so the operations and development teams can securely troubleshoot issues on the fly. And remote accessibility mean no more driving into the office to access locked down servers in the middle of the night. On the security side of things Splunk helps Dave’s team track and monitor known fraudsters and bad user names allowing them to stay vigilant when monitoring external attacks. They also use Splunk to deliver reports for customers, executive committee members and the Security Advisory Committee (with representatives from the founding banks).

Amkor

Henry Grant of Amkor a $2.1B provider of packaging/assembly and testing services for the semiconductor industry also presented an overview of how his Corporate Data Center team uses Splunk. Henry overseas operations for the company’s SAP, PLM, Supply Chain, Hyperion and Oracle systems. Amkor has a heterogeneous environment of Sun Solaris, IBM iSeries, Cisco ASA firewalls, packaged and custom web and J2EE applications and TACAS/Radius accounting and access control technologies. With manufacturing locations in China, Japan, Korea, Taiwan, Singapore and The Philippines and headquarters in Chandler, AZ, the Amkor team is challenged with log and event data overload. GBs of data a day generated at multiple points makes operational troubleshooting and security investigations extremely complex.

SOX Compliance

Proving SOX compliance has traditionally been handled by writing and maintaining scripts to collect and report on errors, access controls and log access activities. It was impossible to segregate duties given the lack of access control to the logs and events themselves. Splunk has taken the place of the awkward script writing and maintenance to collect iSeries, Unix and application events and logs and provide automated schedule reports. The team is now expanding the Splunk footprint to handle network and Oracle logs as well.

Application and System Monitoring

Like most enterprise IT shops, Amkor has figured out that traditional point monitoring tools aren’t enough as they have a hard time scaling to all the modern day technologies, require intrusive agents and only work for known events but don’t handle anomalies and unknowns. Too many issues end up being reported by end users themselves rather than the monitoring systems. With Splunk Henry’s team detects event anomalies in real time and has dramatically cut their response time by hours per incident.

Tools for the Help Desk

Sometimes it’s the simple things that can cut your response time, escalations and IT budget. The Amkor team noticed a lot of calls and emails regarding VPN set-up and access across the company. With Splunk level 1 help desk agents are now able to resolve most of the VPN issues without creating an escalation. Henry’s team built a VPN dashboard driven by a series of searches and reports that gives entry level help desk personnel the insight they need to troubleshoot problems right away.

Henry’s Splunk Tips

The best part of Henry’s overview were the tips for a successful Splunk implementation. I’ve included the list here in hopes that these may help you as well.

  • Provide training that caters to each group’s need.
  • Utilize the deployment Server.
  • Develop a Common Information Model.
  • Update and change as needed.
  • Use Tagging to Normalize Data.
  • Monitor Scheduled Compliance Reports by using the Audit Logs.
  • Splunk into your processes where possible.
  • Setup Test/Dev Environment and a Test/Dev Index .

Intuit Consumer Group

The Intuit team of Jeff Ludwig, Chief Architect and Larry Raab, Architect of the Consumer Group joined us to share how use Splunk in production support operations. Jeff leads the Consumer Group’s Connected Services Development for electronic and print tax and payroll filings for TurboTax, ProSeries, Lacerte and QuickBooks. Larry speciali a large-scale, highly available application and systems architect responsible for the consumer group applications and infrastructure.

While the original use for Splunk at Intuit was application management, Jeff and Larry covered three additional ways they have applied Splunk including reliable monitoring, improving user experience and large-scale reporting for compliance and business intelligence.

The Great Firewall of China: Internet Censorship Run Wild

The past couple of days I’ve been visiting China meeting with some of our technology and channel partners. It just so happens I was present in Beijing for the 20th anniversary of the 1989 Tiananmen Square Events. Yes it really did happen despite what the Chinese government says. Speaking on Saturday at the F5 APAC Sales Kickoff I found myself staying over the weekend with Sunday off to roam around Beijing like a tourist, something I rarely get a chance to do on business trips. It is amazing to me to see how the Chinese and Taiwanese work on Saturdays. In the US we rarely see that. Europeans chastise Americans for working too hard but I guess they should really see the work ethic in Asia and then we’d look more normal.

Watching the 2008 Beijing Olympics last summer things there certainly seemed more normal than 20 years ago, but being there in person with all the festivities gone things seemed really strange to me. It is very difficult to describe. Maybe I was jaded by all the newspapers I’d read on the way to Beijing. On a nice long 13 hour flight from Washington DC with plenty of reading material I consumed James Kynge’s piece in the Financial Times questioning whether the Western media really understood why the student demonstrators were protesting. He went on ascribing the word “democracy” with the student motivations and questioning whether we or they really knew what it meant despite the fact that he spells out their desires in plan old English which sounds like democracy to me.

“Almost everything fell within its scope: campaigns against corruption, nepotism, inflation, police brutality, bureaucracy, official privilege, media censorship, human rights abuses, cramped student dormitories and the smothering of democratic urges. But to say the demonstrations were to “demand democracy” is an oversimplification.”
James Kynge, Financial Times

It’s almost impossible to describe the strange feeling I got while walking through Tiananmen Square observing the soldiers and the huge portrait of General Mao that dominates the landscape. Maybe part of it was due to the increased tension of the anniversary. Maybe not. Tiananmen has come to symbolize the unspoken and largely unrecognized tension between the economic progress driving modern China and the old fashion communist government still ruling there. The Chinese seem to have a foot in both camps. The eeriness I felt came not only from my surroundings and an understanding of the principles they stood for but also from the reaction of my Chinese and Taiwanese friends. Their usually jubilant outgoing personalities were completely subdued in the square. Was a sign of respect and mourning that drove their thoughts? Perhaps to some extent. But in quiet whispers and conversations out of the ear shot of any “green” uniformed soldiers (versus the “blue uniformed” security guards they confessed to being actually scared to speak for fear of someone or something listening. Challenging them I said, “surely you must be joking.” But it was no joke. Only when we crossed the street into the forbidden city did their usual personalities return.

Of course this began a prolonged conversation over the next 24 hours as we visited the great wall, a new Beijing restaurant and departed through the impressive new Beijing airport. I kept asking and trying to understand. How can a country of so many people be controlled by the minds of so few? What are the real limitations to speak out? And what effect will economic progress have on the political future of China? There was no shortage of stories supporting the fact that the government still does take a very heavy hand to those who disagree. But rather than discuss it, everyday Beijing seems to sweep the event of 20 years ago under the rug. As one of my Chinese friends said, “everyone is embarrassed and we just pretend it never happened.”

At the same time I was traveling through out China, the articles started pouring in about Beijing’s efforts to step up Internet and IT censorship. Upon reading the perspectives pouring in about “Green Dam” I was reminded of the impact the technology industry is having on the whole situation. It was bad enough I couldn’t get to sites like Twitter and Youtube form my hotel room. Now the Chinese government is requiring every PC sold in the country starting July 1st has to have special software blocking all sorts of things. The move is being presented as an attempt to protect children from online pornography but is obviously one more attempt by Beijing take its censorship to a new level. China currently has the world’s most sophisticated and multi-layered system of Internet censorship. Objectionable content on domestic Web sites is deleted or prevented from being published, and access to a large number of overseas Web sites is blocked or “filtered.” Decisions about what to censor are based on the Chinese government’s attempts to control the minds of 1.2B Chinese. There is no transparency or accountability, no public consultation in developing block lists or censorship criteria, and no way to appeal the blockage or removal of Web content.

In a notice to PC makers, the Ministry of Industry and Information Technology said all PCs shipped in China needed to offer Green Dam/Youth Escort, identified as a “green internet filtering software”, either pre-installed or as part of basic software packages. In May 2008, the government picked Jinhui Technology and Dazheng Language Technology, two Chinese software companies to develop the software, according to a contract award notice from the MIIT. While these companies claim their software is only being used to block sites although last year, researchers discovered that a Chinese version of Skype contained the ability to block politically sensitive words in instant messaging chats, and to keep a record of the use of such words.

Splunk Lab in Asia Launches to Develop New IT Search Apps

The last two weeks I’ve been traveling throughout Asia with our new partners at Systex and the Splunk Asia team. In Singapore, Hong Kong, China and Taiwan we met with government agency, high tech manufacturing, insurance, online gaming and managed service provider customers who told us how critical Splunk is to their IT organizations, especially as budgets get even tighter.

Systex is now our master distributor covering Taiwan, China, Hong Kong, Singapore, Thailand and Malaysia. Systex is an amazing company fueled by Taiwanese entrepreneurship, creativity and innovation. The company is part distributor, part reseller, part system integrator and part independent software developer. The 2,900 Systex employees are led by CEO Hilo Chen and COO Frank Lin. Hilo did a stint at Yahoo! Asia before joining Systex as CEO. He is a very friendly, engaging and good nature executive who commands the passion of his team. Frank is detail oriented and intense and he has an ability to focus on what seems to be the impossible and get it done.

I’m not used to people pushing faster than I do, but the Systex team are reminding me what start-up speed is all about.

The Systex system integration and software business is fueled by more than 1,400 engineers with deep domain expertise in financial trading and banking systems, network security, database administration, storage, virtualization, disaster recovery, IT service management, telecommunications OSS/BSS, unified communications, business intelligence and more. This past week we unleashed the creativity of more than 400 of those engineers, product managers, sales personnel and business unit heads. We met at a three day kickoff event for the launch of a joint Splunk Lab designed to come up with new areas to apply IT Search and new Splunk Apps for a variety of use cases.

It is our hope that our joint work together will result in lots of new Apps available for download by Splunk users all over the world.

The event started Thursday with a press conference at the Westin in Taipei. We were joined at the press conference by more than three dozen press covering innovation in Asia. We discussed the design of the partnership, the Splunk Lab and some of the joint customers including Allianz Insurance, IAH Games, and The Malaysian Prime Minister’s Office. Allianz is using Splunk to report on F5 Big IP load balancer activities. IAH is mining their online multi-player game events and logs for insight into user patterns and activities including market basket analysis across different game properties. The Malaysian PM’s office uses Splunk to secure their email messaging system.

The press asked some very good questions about various use cases and our strategy for accelerating activities in Asia with Systex. Richard Tang and Johnny Lin attended the event from Systex as well and provided a great overview of how the Splunk Lab is coming together and what kind of solutions Systex is creating around Splunk. Richard has been very patient with me and has taught me enough Mandarin to completely embarrass myself during my last few visits.

On Friday 260 engineers and product managers attended an all day Splunk Boot Camp at the Systex UCOM training center in downtown Taipei. The day was divided into two three and a half hour sessions. Each session covered using, administering and deploying Splunk. There was a brief section on developing Splunk Apps including building of a network management application.

One of the product managers commented to me at the end of the day, “My mind is broken on Splunk, there is so much you can do with it.”

Saturday’s session was the Splunk Lab kickoff event and creative activity attended by 300 business unit heads, sales people, product managers and field sales engineers. I was amazed. We went from 8:30am to 6:30pm on a Saturday. The level of energy was unlike anything I’d ever experienced before. Taking the long trip back from Taipei by way of Tokyo, I am just in awe at how two organizations half a world a part have so tightly bonded in just six months. I’m very impressed by the Taiwanese work ethic and dedication.

Kord Campbell, Splunk’s Director of Developer/ISV program gave a great talk on developing Splunk Apps to start the working round tables. Each business unit (twelve in all) spent three hours coming up with ideas for Splunk in their unit including what Splunk Apps they were going to create and which customers they were targeting. The areas included

  • Financial Trading Platforms
  • Banking and ATM Systems
  • Database Serivces
  • Information and Security
  • Business Continuity and Disaster Recovery
  • Customer Service
  • Data Management & Integration
  • Unified Communications
  • IT Service Management
  • Education & Training

Teams were judged on several factors including creativity, feasibility, significance to current business and target customer profiles.

The winning team didn’t use slides but instead acted out their presentation in a 15 minute skit. It was wild and reminded me of how dysfunctional most IT organizations are today. Not that we needed reminding :-)

The Financial Services Business Unit was judged the winner. This team has developed market trading platform software in a joint venture with Reuters and explored using Splunk with their quotes and trading solutions and for market compliance. The first scenario involved monitoring TAIFEX, TWSE and OTC trades and examine patterns indicating potential fraudulent activities.

The second scenario showed how IT Search can be applied to troubleshooting the electronic system including buy side, sell side, cash position, web interfaces, trading systems and risk management. Actors in the scenario ranged from investors, web infrastructure managers, dealer groups, trading managers, CRM users and back office personnel. The team called their solution “A Lighthouse in the Dark.”

Perhaps the most interesting integration of Splunk though was the mining of data from the web application platform to determine which features users tapped into and which ones they tried once but never went back to. By examining page views for new functions and correlating those with trade volume deltas the team can continuously monitor the revenue effects of application and site changes.

The Splunk Lab launch has us thinking about how to get other people collaborating to build new applications for IT Search. We’re planning to launch a public site soon that will allow domain experts from all over the world to work together and create great Splunk Apps. So we decided to take the elevator to the top floor of Taipei 101, the world’s tallest building to look for more…


Top Floor at Taipei 101


View to the East of Taipei

Press Conference


Frank Lin, COO, Systex


Me


Robert Lau - Splunk & Emy - Systex


Hilo Chen, CEO, Systex


UCOM Technical Training Center

Kord Campbell - Splunk


Splunk Lab Team Competition


Winning financial services App


A little bit of fun

Taipei 101 - World’s Tallest Building

Splunk Live Southwest 2008

This week we’ve been moseying through the Southwestern part of the US with our Splunk Live show. We changed up the format a bit with Splunk technical workshops in the morning and customer round tables in the afternoon. The technical workshops were a big hit with more than 200 people registered to engage with our Splunk Experts. During the workshop you were able to download, install, configure and start using Splunk on your laptop or server with remote access. The best part about Splunk Live events though is sharing ideas with other Splunk fanatics.

Ryan Peterson from Infusionsoft, a marketing automation company, gave a great talk in Scottsdale about his Splunk deployment for the company’s email infrastructure. Ryan is tasked with keeping more than 12M emails a week flowing out of the system to support Infusionsoft’s Automated Follow-up Technology (AFT). Ryan has multiple servers in different geographies in addition to PCI Compliance requirements. He demonstrated using Splunk to troubleshoot problems spread across the messaging infrastructure, address reporting inaccuracies and deliver PCI reports to auditors. He’s even indexing the content of email with Splunk using a scripted LDAP data input. Cool stuff.

In San Diego Tony Doan of the Genomics Institute at the Novartis Research Foundation (GNF) and Eric Van Johnson from Sony Consumer Electronics joined us. Tony is a security engineer and former pen tester. He also confesses to be a recovering Unix sysadmin. GNF has 600 Windows desktops and several hundred Windows and Linux servers supporting the discovery of new biological processes and improved human therapeutics. Tony discussed how they splunk Cisco CSC, Bluecoat, Symantec AV, Arpwatch, Cisco Switches and Wifi access points to find what he calls “previously unknowns” to improve operational availability and security. He says they’re finding new uses everyday but Tony’s favorite is splunking Cisco IPS and Cisco MARS events looking for odd behaviors. Next up for GNF is eating Windows Event Logs and Windows Registry inputs together with summary indexing for consolidated reporting.

Eric Van Johnson is the eServices Hosting and Operations Manager at Sony Consumer electronics. He led an great discussion on splunking IBM Websphere and MQ Series events including how Sony has integrated operations and development environments to identify problems with complex apps more quickly and avoid unnecessary escalations to the development team. He shared with us Sony’s roll out of Splunk to their Business Intelligence Group. The idea is to complement aggregated WebMethods data reporting for business activity monitoring. Next up he wants to feed Splunk data back and forth with Verizon’s hosting operations since some of the Sony servers are hosted at Verizon and Verizon is also using Splunk.

In LA Rich Horace, Director of Systems Engineering and Operations at Fox Interactive Media demonstrated how Fox uses Splunk in the Fox Audience Network. Basically these are the guys that serve web advertisements across all the Fox properties including MySpace, Rotten Tomatoes, Fox Sports and IGN. He’s challenged with launching new monetization platforms and keeping the existing ones running. Rich gave a fantastic overview of his Splunk installation which consolidates/aggregates data form disparate systems in order to protect against hackers and meet PCI and SOX requirements. He currently runs an environment with ~600 Linux servers, load balancers, servers, NetApps and network switches. So far he’s indexed 1.5B events. We engaged with everyone in a lively discussion about securing production sites from developers and controlling and auditing access to data using Splunk’s access controls and search filters. Rich also discussed how Fox is using Splunk to integrate with various Citrix products including Netscaler and XenApp.

Thanks to everyone who shared their stories with us this week, it was really awesome.

Ode to Log Management

I love “log management.” I hate log management.

I love log management because years ago it was the impetus for IT to move beyond simple SNMP monitoring to collecting and trying to understand a much richer set of data about complex environments.

I hate log management for over the years it has been co-opted by vendors and analysts who’ve pigeon holed it into yet another IT management silo. These vendors and analysts have narrowly defined log management as the collection and storage of logs in some locked repository used to generate static reports to satisfy regulators, auditors and IT governance boards.

Why am I so bitter?

First it turns out logs are critical to many other stakeholders in the enterprise. Operations needs real time access to logs in order to find and fix problems and improve mean time to recovery (MTTR). Security needs logs to catch bad guys. Business people need logs to understand customer and service behavior and provide service level measurements. So locking up logs in a static repository designed for one constituency severely limits their value and diminishes the return on investment not only in a log management solution but also the return on your IT assets overall.

Secondly logs alone don’t provide anyone of the IT stakeholders with a complete picture.

Let’s take a simple example right from the hottest compliance use case today — PCI. The Payment Card Industry (PCI) Security Standards Council founded by American Express, Discover Financial Services, JCB International, Mastercard and Visa has outlined requirements for security management, policies, procedures, network architecture and software design. If you are a merchant accepting credit or debit cards and you process more than 20,000 transactions per year there are twelve specific requirements. Failure to comply with the requirements is not an option. You can be fined heavily and you can lose your ability to accept credit and debit cards.

One of the twelve requirements is the commitment to monitoring and investigating changes to configuration and password files for any application, server or device involved in the processing of card holder information and transactions. In the case of file content, permissions or attribute changes, logs will only tell me part of the story. Yes a Windows, Linux or Unix log will tell me a file has been changed but it won’t tell me who changed it. It also won’t tell me if the change was authorized or not. To understand who changed a file I need to look at the other user processes running on that server at the same time the file was changed. What user processes were running and who owned them? In Unix or Linux this information is easily viewed with a simple “ps” or “top” command but doesn’t exist in any log. In order to understand if the change was authorized or not I need to compare the log and file change information with the user information and any tickets from the service desk authorizing this user to make this type of modification.

The real reason I believe we need to move on from talking about log management is log management isn’t a market. It isn’t a solution. It is a feature in a much broader landscape of harnessing all the data being generated by our IT infrastructures.

Turning all that data info information for every stakeholder is important to the future of IT as environments grow more complex, dynamic, service oriented, virtualized and mission critical. Not just to report on compliance controls, but to improve our speed of root cause analysis, increase our ability to quickly and comprehensively investigate security attacks and develop more intimate relationships with our customers by better understand their behavior and providing a transparent view of the services they are receiving in return.

New Splunk Apps Launch at Interop and MMS

logo_interoplv2008_large.png

logo_mms_large.png
This week we were rolling in Las Vegas with Interop at one end of the strip and the Microsoft Management Summit at the other end.

At Interop we launched the Splunk for Change Management app. And at MMS the Splunk for Windows Management app made it’s debut.

Both apps make use of the Splunk Platform which provides a common set of services and APIs making it easy to create and integrate applications that leverage vast amounts of IT data. These are the second and third applications in a series of new releases we’ll be doing this year.
Splunk for PCI was the first app launched last quarter.

Splunk for Change Management App

Splunk for Change Management takes advantage of the fact that we index not just logs but configurations and file system changes as well. It also leverages a little known (but I think soon to be much more popular) Splunk search command called diff. Diff lets you easily compare two search results and returns a single result that is the different between the two. You can compare values of specific fields of results as well as every line of multi line events and files. This makes it really easy to compare configurations across lots of locations. Splunk for Change Management leverages these capabilities and brings integrated change audit, change detection and change validation.

Now your can detect unauthorized changes by indexing your trouble tickets and ticketing system logs together with your service, device and application events and configurations. We use Jira internally and find indexing our Jira tickets enables us to immediately know if a change was authorized or not. No more jumping between redundant and siloed consoles searching for the answer or writing all kinds of complicated data transformation scripts to compare the output of different management systems.

And for the first time we introduce to the industry the concept of Change Validation. Today many of us have the ability to blast out patches to hundreds of servers and device automatically. But how do we know that the changes had the desired effect? By observing the state and events generated by the actual patched systems we can now compare the before and after actual behavior. Splunk brings change audit events and configuration data together with activity and error logs so you can connect change with actual system and user behavior.

The app includes:

  • Out-of-the-box dashboards with over 40 reports showing changes across all datacenter components including applications, servers and network devices.
  • Predefined alerts that detect unauthorized change on the basis of configuration variances and correlation with service desk systems.
  • Predefined searches to help identify service-impacting changes quickly.
  • Integration with service desk systems to close the loop on change management by validating the effect of change on system behavior.

Splunk for Windows Management App

This new app integrates Microsoft’s System Center Operations Manager’s command-and-control view of a Windows infrastructure with Splunk’s IT Search. The latest version of Splunk now indexes all IT data generated by Windows servers and applications — event logs, registry keys, performance metrics and application log files. Everything is searchable from a single place to resolve service-impacting incidents faster, enhance monitoring coverage, and validate service levels.

What’s really cool is Splunk searches can be launched through Tasks in the System Center Operations Manager Console on any aspect of the infrastructure being monitored, and can be expanded to include far-flung elements of the IT infrastructure for additional context – regardless of platform or technology. Its super fast to identify information across the Windows Event Log, the Windows

Splunk and US Federal Government Agencies

foselogo_large.png This week we’re at FOSE 2008 demonstrating how we’re collaborating with US Federal Agencies. A number of agencies have already joined the Splunk community including:
  • Executive Office of the President
  • Federal Bureau of Investigation
  • NASA
  • Social Security Administration
  • US Department of Agriculture
  • US Department of Defense
  • US Department of Energy
  • US Department of Homeland Security
  • US Department of Interior
  • US Department of Justice
  • US Department of Labor
  • US Navy
  • US Department of State
  • US Department of Transportation

Many of these customers are applying Splunk to extreme applications with large data volumes from many different disparate sources. As you can imagine the complexity of security and compliance concerns, agency interactions and a sophisticated web of outsourcing to federal system integrators provides fertile ground for IT Search as a new way of solving all kinds of problems.

Typically our collaboration involves operations, security and compliance people from both the agency and system integrator sides. Agencies continue with their pursuit to cut costs and outsource while being driven with a host of new projects every year. And system integrators continue to search for new ways to bid more competitively by demonstrating new ways to more efficiently develop, deploy and manage technology. This means the business of managing our nations IT infrastructure is significantly more complex and dynamic than ever.

As an example, the current state of the world demands a serious risk management approach to Federal Government systems. All agencies have implemented some type of security in-depth strategy with firewalls, vulnerability and IDS scans. While these technologies are effective in their particular function they generate a tremendous amount of data making it impossible to get a holistic view. These extreme customer environments generate more data and are more dynamic that traditional system and security management approaches can handle. Traditional database and SEIM approaches just don’t scale.

Our own Bill Hornish, who attempted for decades to implement these traditional approaches at several large agencies has put together a really nice video explaining the challenges of risk management in Federal environments and how Splunk can help.

We’re learning a lot by working with these extreme customers and believe they can teach us a lot about what the rest of the Splunk community will eventually experience when applying IT Search to larger, more dynamic environments in the commercial sector as well.

The Splunk Platform Has Launched

Without a doubt the past week has been the most amazing week in Splunk history. The crazy coast to coast multi-city launch left us all exhausted and electrified. A few of the things that stick in my mind…

First Splunk 3.2 including Splunk for Windows went live on our download page last Saturday and more than 40% of our downloads in the past week have been for our new Windows version. Then Nick Selby of 451 Group wrote an analyst brief on us. He said, “Splunk is awesome: it’s multiplatform, easy to install and easy to use. And with an abstraction layer of logs, configuration files and system messages, traps and alerts, it’s seriously useful.” 451 has a reputation for ripping vendors, so we’re flattered.

Dana Gardner, analyst with Interarbor wrote a very eloquent analysis of our platform launch on ZD Net. “Splunk has created the means to offer developers easy access to that data and the powerful inferences gleaned from comprehensive IT search. That means the data can go places no log file has gone before,” says Dana. Developers are certainly doing some way cool things with Splunk.

I’ve seen a couple of neat visualization applications including this one called Replay. It shows you a live or time lapsed view of your event streams. Here you can see the replay application hooked up to our internal wiki showing who’s doing what over a 24 hour period. Click on the image for the movie.

replay.png

As for our own applications, the Splunk for PCI app drew tremendous interest at our series of Splunk Live events this past week. It’s just one example of how a business person with domain knowledge can package their own Splunk configuration as an application. If you haven’t seen Raffy’s video on the PCI Application, check it out here.

pci.png

We also showed the Splunk for Change Management application as well. Seeing someone touch a file and watching the Splunk dashboard update instantaneously is an awesome display of how flexible Splunk has become. Check out the developer program for yourself and get your goods up on SplunkBase so we can all check em out.

changemgmt.png

Chaos & Insanity

computerworld.jpg

Last week Splunk sponsored ComputerWorld’s Infrastructure World conference along with HP and IBM. I needed to come up with a talk and I wanted to do something new.

I’ve been thinking about how to describe the challenges we have managing all this changing technology and innovation. Note this is seriously a work in progress. I’m developing a theory that there are three fundamental drivers to data center chaos.

  • expectations,
  • complexity and
  • accountability

Any new business or consumer technology can be quickly met with significant expectations if it becomes successful. Our dependence on everything from wireless email, online travel reservation systems and hosted software as a service dramatically increases the expectations these technologies will always be available, fast and do everything we want. Examples of failed expectation are everywhere. A few examples. On June, 20th United Airlines canceled 24 flights and delayed another 286 flights due to a “computer gremlin.” Research in Motion recently experienced yet another 24 hour email outage and more than 2.5M users were without service in North America. Salesforce.com, pioneers of Software as a Service (SAAS), a more reliable alternative to running it yourself continue to have outages as well.

Rising expectations, success and dependency force increased complexity in both scope and scale to meet demand. Scope complexity abounds as more and more features and capabilities are added to the services we depend on. I used an example of Citigroup’s internal SOA architecture that has five federated ESBs — one of every technology flavor. Scale complexity occurs as infrastructures grow so large they begin to stress under their own weight. Salesforce.com for example is now processing more than 90M transactions a day through their web interface and AppExchange platform. At a meager 10 messages per transaction that’s almost a billion messages a day going through the infrastructure. Wow. Imagine finding a needle in that haystack.

Finally once popularity rises and the technology become established, accountability arrives. Now we have to worry how safe is the technology and in many cases monitor what people are doing with it. Everyone by now knows of the TJX situation where 45.7M credit and debit card numbers were stolen by hackers that somehow infiltrated its processing systems. The first card numbers were stolen three years ago and still there is no definitive explanation. Everything from cracked WEP keys, software tampered kiosks and insider job have been offered as possible causes. More recently TDAmeritrade and Monster.com have experienced similar breaches of user and account information totaling into the millions. And compliance is everywhere. SOX, PCI, ITIL, HIPAA, FFIEC, FISMA, ISO, CoBIT, COSO and other mandates means IT staff have reduced access and visibility into the systems their trying to manage and keep running.

expectations + complexity + accountability = chaos

I’m interested in your thoughts on the direction this is taking. I’ll be sure to blog more later as the ideas develop.

Innovation Awards at Deutsche Bank

Yesterday I gave the keynote at the annual Deutsche Bank innovation awards ceremony in London. Once a year DB celebrates the innovators within the bank and awards prizes for the most entrepreneurial, cost reducing and revenue generating new inventions.

What a cool thing to do.

I have to admit speaking to a group like this is a bit different from my usual audiences of Linux geeks, network engineers, security jocks, and application developers. But it was really amazing to see how a global company promotes and rewards all kinds of innovative ideas and projects.

Compliance Interpretation Recipes

As a continuation on the compliance topic, let’s review some of the major mandates you might come across in IT. Some of these mandates are more prescriptive, like PCI and others are more widely open to interpretation, like SOX.

  • SOX is a securities regulation designed to ensure accurate financial reporting for public companies and companies preparing to go public.
  • PCI is a credit card privacy regulation to ensure credit cardholder data is protected. Anyone accepting credit cards or processing credit card payments must be concerned with PCI.
  • ITIL sets out specific IT process standards for IT services management best practices and frameworks. Organizations that adopt it, usually due to IT’s desire to improve overall processes and efficiency.
  • HIPAA is a healthcare regulation designed to migrate to electronic patient records; ensuring the privacy of records through effective security controls. US healthcare providers and payers (insurers) need to pay attention to HIPAA.
  • FFIEC is a banking regulation to ensure banks don’t fail because of fraud. IT security is a small subset of the regulation. US banks are mandated by FFIEC.
  • DCID is a security regulation designed to ensure security of defense information systems. If you work for a US defense agency or contractor you probably have already heard of it.
  • NISPOM is a security regulation protecting security of classified networks. US government agencies and contractors with classified data fall under these guidelines.
  • FISMA is a security regulation designed to bolster computer and network security within the Federal Government agencies and government contractors.
  • ISO 17799 is a general IT process set of standards addressing overall risk management and controls. Organizations that adopt it, usually due so at the recommendation of their auditors.
  • CoBIT General IT process standards Overall risk management and controls Organizations that adopt it, usually do so due to auditors.
  • COSO is a general security standard addressing the management of risk associated with security breaches. Organizations that adopt it, usually do so because of their auditors.

So those are some of the major mandates. How and what can you do to try and understand the potential impact on you, your job and your career? Well certainly there are lots of people who’ve written about compliance that know way more than I do. But I’ve been trying to boil it down to a few simple recipes.

For any mandate, you should be sure that you understand its motivation and origin. Once you understand its motivation, your best recipe for success is to make that motivation your own. Educate your organization on what the mandate is designed to do. Create a climate in your organization where the mandates goals are also your goals. When the auditors and courts see that you have adopted the spirit and not just the letter of the law, deficiencies are treated with lenience as anomalies. Some mandates, including HIPAA and FFIEC, are pretty specific about the requirement to conduct an individualized risk assessment for a given organization relative to the mandate’s objectives, and based on that risk assessment adopt a customized set of controls.

As we’ve seen with recent prosecutions of corporate malfeasance, it’s those individuals and organizations that take a cavalier attitude toward the law that are receiving the largest penalties.
Nearly every mandate you will face is motivated by one or more of these concerns. You can gain leverage in a compliance program by adopting a consistent set of practices for multiple mandates sharing common goals.

Once you understand each mandate, you can identify specific controls using log data, which usually will fit into one of the following categories:

  • Monitoring IT data for security and operations issues.
  • Reporting on other controls using IT data.
  • Ad hoc search of log data for investigations & discovery requests.

1. Privacy Protection Recipe

Protecting customer, employee and consumer privacy is the motivator behind the security and privacy rules within the Health Information Portability and Accountability Act (HIPAA) that impacts all healthcare providers and payers, which includes companies who self-insure. The Gramm-Leach-Bliley Act, GLBA, has a similar concern but with consumer financial information. California’s SB-1386 is becoming a model for other states of a particularly aggressive form of privacy protection. And last but not least, the Payment Card Industry security standard (PCI), enforced by the credit card networks for any organization accepting payments by credit card, is an extremely specific program designed to protect consumer financial information.

Monitoring Monitor for network intrusions, suspicious outgoing traffic.
Reporting Report on access control, firewall events to prove these controls are in place and properly configured.
Ad hoc Search Be able to investigate logs of access to data via applications, database queries, filesystem access. You may have to investigate any and all consumer reports that they believe your organization mismanaged their data – which may involve hundreds of ad hoc searches a week if you’re a major consumer financial or healthcare organizations.

2. Financial Reporting Recipe

Ensure fairness in financial markets is the motivation for Sarbanes-Oxley. The scope of concern relative to IT is the prevention and detection of financial reporting inaccuracies, fraud, and revenue-generating service interruptions. IT auditors are equally concerned with security and operations. Concerns range from an authorized user of a business system abusing their privilege in order to execute fraudulent transactions, to downtime of a revenue-generating system causing lost revenue. Data integrity and business continuity are of significant concern, while privacy and secrecy are not relevant.

Monitoring Monitor for suspicious transaction patterns, data changes that bypass application logic, and system failures.
Reporting Report and review on new kinds of events, system changes, and data changes.
Ad hoc Search Ensure that developers can do ad hoc search of logs without accessing production systems, as strict access controls will be in place.

Demystifying Compliance

Today I gave a talk at the Interop Data Center Summit happening during the Interop conference this week in Las Vegas. The talk was titled Demystifying Compliance. The goal was to dissect what compliance regulations and mandates mean for the future of IT. It was well attended with roughly 375 people. Thanks to the Andreas and Johna from Nemertes Research for inviting me to speak.

Interop was kind of a strange but interesting place to be talking about compliance. It’s traditionally a very networking focused conference. Interop has it’s roots in proving interoperability of various vendor’s technologies. Three days before the start of the show more than 40 vendors build a network from scratch. It’s sort of a living laboratory of networking technologies — wireless, wired, security, management etc.

But Interop has been growing up. The conference went through a “survival time” with the boom and bust of the networking market over the past few years. Now it’s leaner and meaner but healthy enough to start exploring things “up the stack” including security and yes, compliance.

Turns out not many people that attend Interop know much about compliance. I wasn’t even sure if they’d be interested : ) So I started out by try to identify the top myths we get bombarded with about compliance and explore what the heck compliance really is and why IT people, in particular networking folks should care.

Myth #1: Compliance equals regulations with specific actions.
False. The reality is most regulations have fuzzy or no detail about IT implementation. The dirty little secret of compliance is auditors are getting rich off each new mandate. Since most mandates are not prescriptive at all, they require complex interpretation for every business. Auditors can also be finicky too. What worked this year or this quarter won’t necessarily work next year or next quarter.

Myth #2: Compliance is an IT security issue.
False. Most mandates are just as concerned with integrity and availability of IT systems. Security it turns out is only a part of what compliance mandate interpretation means for IT. In fact mandates like SOX often have a much larger burden on IT because of the increased dependency on things like effective root cause analysis.

Myth #3: I have to store my original IT data for seven years.

False. Very few mandates specify data retention times. It use to be considered plausible to delete all your data after some period of time. Frank Quattrone and CSFB proved that theory wrong when the former investment banker was confronted with evidence of allegedly-incriminating emails in a widely publicized series of trials. The banks policy was that all emails were deleted after 30 days and thus they could not produce materials for trial. Unfortunately, those pesky emails always have a way of showing up cached on another server or laptop somewhere. But the question remains, how long should I keep my data? Since mandates don’t spell out specific retention times and you can’t keep everything forever, what do you do?

Myth #4: A canned set of reports will make me compliant.

False. See Myth #1. The regulations almost never list a specific report.

Myth #5: I need to buy a commercial solution to be compliant.

False: Vendors don’t have any special insight into what will make you compliant. In fact the reports that vendors supply are typically developed by some junior product manager, sitting in a cubicle, trying to interpret what a mandate might mean for your industry and company. Needless to say, the auditors (internal and external) won’t buy off on that. How could they? They wouldn’t get paid!

What is compliance?

So if these are the myths, what is compliance? The interpretation of any set of compliance mandates really involves adhering to all standards, policies and regulations applying to a given organization in a given industry. Compliance is really an overlay to IT security, IT operations, HR, finance
or any other business function. It is not a separate function in an of itself.

Compliance is usually driven by external laws and regulations (SOX) and internal policies (ITIL). Every business process in a mature organization has some compliance dimension.

Compliance is on the IT agenda largely because of a wave of accounting scandals exposing a lack of internal controls — Enron, Tyco and Andersen started a tidal wave of backlash for more checks and balances. More recently the perceived threat of electronic sabotage to critical infrastructures has lots of IT people thinking about compliance. A piece in the Washington Post pointed out for example the number of times a day hackers slip past security measures and break into the national energy grid. Scary stuff.

But there have also been a number of well-publicized incidents involving the theft of information and identities. 45.7M credit and debit card numbers stolen from TJX. 145,000 consumers’ personal data purchased by 50 fraudulent companies from Choicepoint. AT&T’s online store web site hacked and credit card information for up to 19,000 customers stolen.

Of course the continued expansion of IT into every aspect of corporate, public and private life means our exposure to being ripped off and the number of compliance attempts to control the situation will only get worse.

Why do companies care about compliance?

Because newer legislation and regulations with real teeth are now being put into law, companies are starting to really care about compliance. The Payment Card Initiative (PCI) includes the possibility of up to $500k per incident and possible termination as a merchant meaning you can no longer take VISA or Mastercard. CA SB 1386: forces disclosure to each individual consumer of possible security breaches and identify theft including the notification by mail to every consumer impacted. And of course the lawyers are not asleep. Negligence lawsuits are now starting to set a higher standard for “duty of care.” Number US banks are suing TJX over the costs they’ve incurred and the courts appear to be very eager to hear the arguments.

Welcome!

I’m Michael Baum. Welcome to my blog.

I hope to find time to write about some of my favorite topics including:

  • Splunk and IT Search.
  • Technology gadgets and software — the stuff we all like to use.
  • Datacenter applications, servers, networks and security — the stuff we all have to keep running.
  • Business, entrepreneurship and venture capital.
  • Wall street and investing.

Comments are always welcome and you can also reach me via email at thebaum (at) splunk (dot) com.