thebaumblog: Microsoft

SplunkLive Seattle Kicks IT

On what was an incredibly beautiful day we had more than 100 Splunk devotees attend our first ever SplunkLive event in Seattle last week. In the shadow of Microsoft we talked about our Windows and Microsoft strategy and compare notes with lots of customers that are running mixed Microsoft, Linux, Solaris environments. Many of our customers with Microsoft Active Directory, Exchange and SharePoint environments are utilizing Splunk to troubleshoot problems and implement security and compliance controls in large-scale, distributed environments. But, I’m still surprised at how little Microsoft .NET we’re seeing in production large-scale applications.

Three Seattle-based customers presented their views on managing mission critical applications, IT data consolidation and Splunk.

  • T-Mobile USA
  • Blue Nile
  • Washington State University

T-Mobile USA

Sean White, Senior Engineer with T-Mobile Operations in Bellevue talked with us about their global rollout of Splunk. Sean is a member of the security engineering team charged with incident response, IDS, vulnerability scanning, anti-virus and enterprise unified logging. He graduated with a B.S. in Computer Science from University of Kansas and has a deep background in large telecom environments initially as a system administrator and webmaster, SS7 network C&C and performance, engineering and now in information security. Sean has been at T-Mobile for 4 years, prior to that at Cingular, AT&T Wireless. T-Mobile USA is the 4th largest US national provider of wireless voice, messaging, and data services to 34M subscribers with annual revenues of $17B. T-Mobile USA is the US operating entity of T-Mobile International AG, the mobile communications subsidiary of Deutsche Telekom AG (NYSE: DT). Deutsche Telekom is one of the largest telecommunications companies in the world, with nearly 120 million customers worldwide

It all started with PCI Compliance

Like many of our enterprise customers, T-Mobile started working with Splunk in one area but quickly saw the value of expanding into others. For Sean and his team, PCI Compliance was the beginning of the Splunk solution footprint, but soon everyone realized the consolidation of logs, events, messages, configurations and changes meant a whole lot more.

Beginning with proving PCI compliance, T-Mobile has very specific requirements. PCI Section 10: Track and monitor all access to network resources handling cardholder data. But in T-Mobile’s case scale was a big issue. Fulfilling PCI DSS Section 10 meant tracking 26+ in-scope applications and the ability to trace transactions from start to finish across 650+ servers running Windows, Linux and Unix varieties. It also means more than 100 individuals logging into Splunk on a daily basis as part of the process.

The Splunk Set-up

The Splunk configuration consists of

  • Pairs of forwarders set up in each of 4 geographic locations.
  • Three short term indexers + 1 short term search box.
  • Three Long-term search boxes hooked into a 32 TB NAS.
  • Centrally controlled from a single deployment server.

The current installation is indexing more than 600GB/day of data and has just passed the 10B event mark. Controlling access to all this data is critical and T-Mobile has Splunk roles set up for managers and application teams to limit access to subsets of the data. The ability to segregate data access along lines of duties is critical to prove PCI compliance.

The Business Case for a SOC

In addition to proving PCI Compliance, T-Mobile has discovered Splunk’s use for Security as well. Not long ago, a SIEM vendor would have told you IDS and firewall logs were all you need. That >=2 sources of data == correlation. Not so much.

“All the best new vulnerabilities are coming in on the application layer.”
- Sean White

Enterprise logging—visibility into all of your IT data—is absolutely critical in defending against modern blended attacks. At T-Mobile Splunk has become a primary analysis tool for deciphering what is happening to the applications, servers and devices on the network. A few saved searches and Splunk helps does real correlation.

Nothing Boring about Logs and IT Data!

PCI Compliance mandates gave T-Mobile the excuse (read funding) to start an enterprise logging initiative. Logging all security, network and application events can truly give insight needed to not only measure and report on compliance controls but also to run a more secure and effective business. PCI has also discovered that integrating the ability to ask any question of their environment and get immediate answers also provides a pile of value to the help desk operations and better business intelligence functions.

“All the information about your company is in your logs—there’s nothing boring about it.”


Blue Nile

Jerry Brennock, Director Core Development at Blue Nile explained how the company is using Splunk to improve the experience of buying diamonds over the Web. Blue Nile, Inc. is an online retailer of diamonds and fine jewelry offering in-depth educational materials and unique online tools that place consumers in control of the jewelry shopping process. Importantly, the focus is on giving customers a great experience at a a great price – this translates to requiring high quality at a low cost. Jerry’s team team builds and support the infrastructure and applications for merchandising and marketing, including the website. He’s been with Blue Nile for 10 years and in the e-commerce space for more than 17.

The Killer Diamond App

Diamond Search is undoubtedly the killer application for Blue Nile’s E-commerce experience. It’s an asynchronous javascript app that has to work across any browser and there are many non-obvious use cases. All three of these factors means it is prone to failure in lots of edge cases.

“If this application isn’t fast and accurate, we don’t sell diamonds.”
- Jerry Brennock

Jerry’s team has embedded tracking pixels with name value pairs to track JavaScript profile information from each diamond search. This together with Web server 500 and 404 errors give the development, operations and customer support teams all the data they need to troubleshoot problems. The challenge is finding customer problems “in the moment” before the sale is lost.

Splunk 4 Lands in the Southwest

Last week we continued our road show launching Splunk 4 through the Southwestern US in Phoenix, San Diego and Los Angeles.This was our second annual gathering of customers, partners and users and we had more than double the attendees at this year’s Splunk Live events. In the morning we held a three-hour hands on technical workshop. Attendees had the opportunity to install and configure Splunk 4 on their laptops or remote server and get one-on-one assistance from the Splunk team. Afternoon sessions and dinner focused on customer presentations. We’re very grateful to all the presenters who took time out of their busy days to share with everyone how Splunk is transforming their IT environments. I captured some notes from the week and thought I’d share them with you.

Early Warning

In Phoenix we had a packed house at the Sanctuary conference center on the side of Camel Back Mountain. At 109 degrees I decided against hiking up it in the early AM. Dave Bridgeman, Data Security Engineer at Early Warning kept things cool showing the audience how his company’s use of Splunk in their security operations center. Early Warning collaborates with major financial services companies to facilitate fraud detection through shared information and knowledge in cross-institution environments. The company has an interesting history having spun out of First Data and is now primarily owned by Bank of America, BB&T, JPMorgan Chase and Wells Fargo.

Dave is a well rounded IT professional who started as a developer then moved into network and security management. He current leads the data security team for Early Warning. The environment he over sees includes a variety of platforms including AS400s, MP300s, AIX, Solaris, Linux and Windows. He uses a combination of Splunk forwarders and syslog forwarders to collect Java and Cobol application logs and FTP/SFTP networking logs.

The Early Warning Splunk installation is designed to track transactions and users from one bank to the next in cross-institution activities. Transaction ID tracing correlates events across applications and services and Splunk alerts the team when jobs fail so the operations and development teams can securely troubleshoot issues on the fly. And remote accessibility mean no more driving into the office to access locked down servers in the middle of the night. On the security side of things Splunk helps Dave’s team track and monitor known fraudsters and bad user names allowing them to stay vigilant when monitoring external attacks. They also use Splunk to deliver reports for customers, executive committee members and the Security Advisory Committee (with representatives from the founding banks).

Amkor

Henry Grant of Amkor a $2.1B provider of packaging/assembly and testing services for the semiconductor industry also presented an overview of how his Corporate Data Center team uses Splunk. Henry overseas operations for the company’s SAP, PLM, Supply Chain, Hyperion and Oracle systems. Amkor has a heterogeneous environment of Sun Solaris, IBM iSeries, Cisco ASA firewalls, packaged and custom web and J2EE applications and TACAS/Radius accounting and access control technologies. With manufacturing locations in China, Japan, Korea, Taiwan, Singapore and The Philippines and headquarters in Chandler, AZ, the Amkor team is challenged with log and event data overload. GBs of data a day generated at multiple points makes operational troubleshooting and security investigations extremely complex.

SOX Compliance

Proving SOX compliance has traditionally been handled by writing and maintaining scripts to collect and report on errors, access controls and log access activities. It was impossible to segregate duties given the lack of access control to the logs and events themselves. Splunk has taken the place of the awkward script writing and maintenance to collect iSeries, Unix and application events and logs and provide automated schedule reports. The team is now expanding the Splunk footprint to handle network and Oracle logs as well.

Application and System Monitoring

Like most enterprise IT shops, Amkor has figured out that traditional point monitoring tools aren’t enough as they have a hard time scaling to all the modern day technologies, require intrusive agents and only work for known events but don’t handle anomalies and unknowns. Too many issues end up being reported by end users themselves rather than the monitoring systems. With Splunk Henry’s team detects event anomalies in real time and has dramatically cut their response time by hours per incident.

Tools for the Help Desk

Sometimes it’s the simple things that can cut your response time, escalations and IT budget. The Amkor team noticed a lot of calls and emails regarding VPN set-up and access across the company. With Splunk level 1 help desk agents are now able to resolve most of the VPN issues without creating an escalation. Henry’s team built a VPN dashboard driven by a series of searches and reports that gives entry level help desk personnel the insight they need to troubleshoot problems right away.

Henry’s Splunk Tips

The best part of Henry’s overview were the tips for a successful Splunk implementation. I’ve included the list here in hopes that these may help you as well.

  • Provide training that caters to each group’s need.
  • Utilize the deployment Server.
  • Develop a Common Information Model.
  • Update and change as needed.
  • Use Tagging to Normalize Data.
  • Monitor Scheduled Compliance Reports by using the Audit Logs.
  • Splunk into your processes where possible.
  • Setup Test/Dev Environment and a Test/Dev Index .

Intuit Consumer Group

The Intuit team of Jeff Ludwig, Chief Architect and Larry Raab, Architect of the Consumer Group joined us to share how use Splunk in production support operations. Jeff leads the Consumer Group’s Connected Services Development for electronic and print tax and payroll filings for TurboTax, ProSeries, Lacerte and QuickBooks. Larry speciali a large-scale, highly available application and systems architect responsible for the consumer group applications and infrastructure.

While the original use for Splunk at Intuit was application management, Jeff and Larry covered three additional ways they have applied Splunk including reliable monitoring, improving user experience and large-scale reporting for compliance and business intelligence.

If Splunk Was An Animal What Would It Be?

Splunk 4 is out of the bag and the Splunk community and our customers are kicking the tires. I even saw several executives from other log management, SIEM and system management vendors registered and attended our world-wide webcast with a thousand attendees. And Twitter is all abuzz with questions, answers and some ass kicking. Yes Splunk 4 kicks ass. It is 2x faster on indexing and up to 10x faster searching. We have a fantastic new App framework where you can build custom views, dashboards and work flows and there are countless numbers of other great improvements and new features. But sometimes we don’t get it completely right and you all let us know.

But back to my question, if Splunk was an animal what kind of animal would it be?

“Odd thing animals. All dogs look up to you. All cats look down to you. Only a pig looks at you as an equal.”

- Winston Churchill

I read that quote today at the birth place of Winston Churchill and it reminded me that Splunk is like a pig. We’ve always looks our users and customers straight in the eye with the good and the not so good. This has always been the transparent way we conduct business. So keep the feedback coming - the praise and the criticism.

One of the areas that I’m especially interested in hearing about is our new App focus. We are in the very early stages of creating Splunk Apps and making them available to the Splunk community. Some are free Apps and some are premium Apps. The free apps are available for immediate download. The premium Apps you need to talk with us about so we can work with you on an installation. At some point we plan to have trial versions of the premium Apps available for download too.

The free Apps include things like

You can easily download the App .spl file, drop it into your splunk/etc/apps directory and check it out. More easily you can download and launch the Apps right from your Splunk Launcher screen (which is an App too). We’re working on fully documenting all these Apps so if you need help now feel free to contact us via support@splunk.com. You can also select “Send Feedback…” on the first menu of the App to contact the specific App team directly via email. We’re especially interested in what doesn’t work, where you get stuck and what else you’d like to see. Several of these Apps are still beta versions so feedback sooner rather than later is much appreciated.

Happy Splunk4ing!

New Splunk Apps Launch at Interop and MMS

logo_interoplv2008_large.png

logo_mms_large.png
This week we were rolling in Las Vegas with Interop at one end of the strip and the Microsoft Management Summit at the other end.

At Interop we launched the Splunk for Change Management app. And at MMS the Splunk for Windows Management app made it’s debut.

Both apps make use of the Splunk Platform which provides a common set of services and APIs making it easy to create and integrate applications that leverage vast amounts of IT data. These are the second and third applications in a series of new releases we’ll be doing this year.
Splunk for PCI was the first app launched last quarter.

Splunk for Change Management App

Splunk for Change Management takes advantage of the fact that we index not just logs but configurations and file system changes as well. It also leverages a little known (but I think soon to be much more popular) Splunk search command called diff. Diff lets you easily compare two search results and returns a single result that is the different between the two. You can compare values of specific fields of results as well as every line of multi line events and files. This makes it really easy to compare configurations across lots of locations. Splunk for Change Management leverages these capabilities and brings integrated change audit, change detection and change validation.

Now your can detect unauthorized changes by indexing your trouble tickets and ticketing system logs together with your service, device and application events and configurations. We use Jira internally and find indexing our Jira tickets enables us to immediately know if a change was authorized or not. No more jumping between redundant and siloed consoles searching for the answer or writing all kinds of complicated data transformation scripts to compare the output of different management systems.

And for the first time we introduce to the industry the concept of Change Validation. Today many of us have the ability to blast out patches to hundreds of servers and device automatically. But how do we know that the changes had the desired effect? By observing the state and events generated by the actual patched systems we can now compare the before and after actual behavior. Splunk brings change audit events and configuration data together with activity and error logs so you can connect change with actual system and user behavior.

The app includes:

  • Out-of-the-box dashboards with over 40 reports showing changes across all datacenter components including applications, servers and network devices.
  • Predefined alerts that detect unauthorized change on the basis of configuration variances and correlation with service desk systems.
  • Predefined searches to help identify service-impacting changes quickly.
  • Integration with service desk systems to close the loop on change management by validating the effect of change on system behavior.

Splunk for Windows Management App

This new app integrates Microsoft’s System Center Operations Manager’s command-and-control view of a Windows infrastructure with Splunk’s IT Search. The latest version of Splunk now indexes all IT data generated by Windows servers and applications — event logs, registry keys, performance metrics and application log files. Everything is searchable from a single place to resolve service-impacting incidents faster, enhance monitoring coverage, and validate service levels.

What’s really cool is Splunk searches can be launched through Tasks in the System Center Operations Manager Console on any aspect of the infrastructure being monitored, and can be expanded to include far-flung elements of the IT infrastructure for additional context – regardless of platform or technology. Its super fast to identify information across the Windows Event Log, the Windows

Welcome!

I’m Michael Baum. Welcome to my blog.

I hope to find time to write about some of my favorite topics including:

  • Splunk and IT Search.
  • Technology gadgets and software — the stuff we all like to use.
  • Datacenter applications, servers, networks and security — the stuff we all have to keep running.
  • Business, entrepreneurship and venture capital.
  • Wall street and investing.

Comments are always welcome and you can also reach me via email at thebaum (at) splunk (dot) com.