thebaumblog: Splunk 4

Splunk 4 Down Under

I visited Sydney and Melbourne last week to host our first Splunk Live events in Australia. Its my first visit to Australia and I’m really blown away by the friendliness of the people we’ve met. And the “Australian for Grep” t-shirt finally had a proper home. Attendees at today’s event in Melbourne and Tuesday’s event in Sydney included an impressive list of current customers and partners and a number of new users evaluating Splunk for the first time including Telstra, Ericsson, InfoSys, Frontline Systems, Fujitsu, GE Capital Finance, Toll Holdings, Vanguard Investments and more. We owe a huge thanks to the team from Digital Networks Australia who sponsored the two events.

Martin Brown, A Large Australian Financial Services Company

In Sydney Martin Brown, pictured below with me, gave an excellent presentation on using Splunk for Identity Management Compliance. Martin is a Technical Architect managing the development and operations of the world wide web application security system‏ for a major financial institution. He’s had many career evolutions from implantable device electronics and software engineering, UNIX and network systems administration, internet systems management and security.

Martin’s company has a requirement for presenting client security history from their web applications and to be able to access this information to look for suspect IDs from the past six months. Tivoli Access Manager (TAM) is used for both external and internal identity management and access control. More than 200,000 clients authenticate externally through TAM.

His Splunk deployment is very much out of the box with a range of saved searches and some role partitioning. It consists of a single Splunk server with 1TByte of local disk for retention. The TAM logs are rsynced regularly and directly mounted from various hosts and systems. 12 internal and 12 external TAM hosts generate 5 GB/day of data or ~2TB of data a year.

The current user base consists of business second level support teams and TAM support group for third level support. The user bases is expected to extend to the Risk Management Group and first level help desk support soon. Their classic use case is

“Client X’s account has been compromised. What applications has he/she logged in to in the past 6 months?”

The old way required days / weeks of work and support from multiple teams. Often needed to pull in log files from offsite backup tapes then grep through GBytes of data from several hosts. Fun fun. Now with Splunk Martin’s team finds answers in minutes and soon will train Tier 1 agents to do the same, eliminating the hassle of Martin’s team fetching data for everyone. Next he plans to add App server, Web Server and Load Balancer data, role partitioning to restrict business user access to relevant logs, off-shore implementations to present local application logs, API consumption for helpdesk one-stop-shop interface.

Nick Clark, Ericsson

Nick Clark is a Technology Manager in the Solution Management & Utilities Consulting, System Integration & Multimedia practice with Ericsson where the focus is on bespoke support and life cycle management services for complex infrastructures. His group focuses on mobile and fixed network infrastructure, telecom services, software, broadband and multimedia solutions for operators, enterprises and the media industry. He presented his Splunk solution which Ericsson implemented at Telstra in the mobile multimedia services area to troubleshoot problems and investigate incidents. The solution was initially implemented to provide coverage of the 2008 Beijhing Olympics. Telstra predicted massive interest for mobile streaming yet demand exceeded all expectations. Splunk helped Ericsson and Telstra quickly pinpoint, manage and address problems. Because application failures and limits were discovered before they cause serious downtime Telstra maintained an uptime above 99.9% during the Olympic Games.

Telstra manages more than 10M users and 50 plus content providers on the Telstra Service Delivery Platform providing multiple mobile portals, content transformation, mobile streaming services and device specific rendering and UI over 2G and 3G networks. The environment consists of 60+ servers (Solaris 9/10, Windows 2003) and many platforms and technologies providing service orchestration, rich media content management, encoding and streaming for terabytes of active content.

Ericcson and Telstra’s challenges before Splunk were numerous including:

  • no central view of logs and events resulting in difficult to troubleshoot problems,
  • support and operations diverted to log fetching and ad-hoc reporting delaying work on high priority projects,
  • no consistent approach to log handling and storage making it difficult to locate, access and archive logs and
  • poor visibility of service and transaction flows extending outages.

The Ericsson team chose Splunk to help Telstra gain a holistic view of the environment, troubleshoot outages more quickly, provide users with ad-hoc reporting and control access to logs with by role. They are currently indexing roughly 20GB per day on a dual processor, dual core Xeon GHz server with 16GB of RAM. 30 support people (tier 1 and up) currently Splunk application, server and network logs and events to troubleshoot problems. The team makes extensive use of Splunk tagging to create alerts for future notification of problems reoccurring. Perhaps the most valuable thing Ericsson has done with Splunk is track end to end transactions on the Service Delivery Platform. With one view across all services and transactions to track activities the team can finally provide transaction level alerting and reporting.

Thank you again to Nick and Martin for presenting so well and Monsour, Martin and Sky with DNA who did a fantastic job and are representing Splunk very well down under.

Splunk 4 Lands in the Southwest

Last week we continued our road show launching Splunk 4 through the Southwestern US in Phoenix, San Diego and Los Angeles.This was our second annual gathering of customers, partners and users and we had more than double the attendees at this year’s Splunk Live events. In the morning we held a three-hour hands on technical workshop. Attendees had the opportunity to install and configure Splunk 4 on their laptops or remote server and get one-on-one assistance from the Splunk team. Afternoon sessions and dinner focused on customer presentations. We’re very grateful to all the presenters who took time out of their busy days to share with everyone how Splunk is transforming their IT environments. I captured some notes from the week and thought I’d share them with you.

Early Warning

In Phoenix we had a packed house at the Sanctuary conference center on the side of Camel Back Mountain. At 109 degrees I decided against hiking up it in the early AM. Dave Bridgeman, Data Security Engineer at Early Warning kept things cool showing the audience how his company’s use of Splunk in their security operations center. Early Warning collaborates with major financial services companies to facilitate fraud detection through shared information and knowledge in cross-institution environments. The company has an interesting history having spun out of First Data and is now primarily owned by Bank of America, BB&T, JPMorgan Chase and Wells Fargo.

Dave is a well rounded IT professional who started as a developer then moved into network and security management. He current leads the data security team for Early Warning. The environment he over sees includes a variety of platforms including AS400s, MP300s, AIX, Solaris, Linux and Windows. He uses a combination of Splunk forwarders and syslog forwarders to collect Java and Cobol application logs and FTP/SFTP networking logs.

The Early Warning Splunk installation is designed to track transactions and users from one bank to the next in cross-institution activities. Transaction ID tracing correlates events across applications and services and Splunk alerts the team when jobs fail so the operations and development teams can securely troubleshoot issues on the fly. And remote accessibility mean no more driving into the office to access locked down servers in the middle of the night. On the security side of things Splunk helps Dave’s team track and monitor known fraudsters and bad user names allowing them to stay vigilant when monitoring external attacks. They also use Splunk to deliver reports for customers, executive committee members and the Security Advisory Committee (with representatives from the founding banks).

Amkor

Henry Grant of Amkor a $2.1B provider of packaging/assembly and testing services for the semiconductor industry also presented an overview of how his Corporate Data Center team uses Splunk. Henry overseas operations for the company’s SAP, PLM, Supply Chain, Hyperion and Oracle systems. Amkor has a heterogeneous environment of Sun Solaris, IBM iSeries, Cisco ASA firewalls, packaged and custom web and J2EE applications and TACAS/Radius accounting and access control technologies. With manufacturing locations in China, Japan, Korea, Taiwan, Singapore and The Philippines and headquarters in Chandler, AZ, the Amkor team is challenged with log and event data overload. GBs of data a day generated at multiple points makes operational troubleshooting and security investigations extremely complex.

SOX Compliance

Proving SOX compliance has traditionally been handled by writing and maintaining scripts to collect and report on errors, access controls and log access activities. It was impossible to segregate duties given the lack of access control to the logs and events themselves. Splunk has taken the place of the awkward script writing and maintenance to collect iSeries, Unix and application events and logs and provide automated schedule reports. The team is now expanding the Splunk footprint to handle network and Oracle logs as well.

Application and System Monitoring

Like most enterprise IT shops, Amkor has figured out that traditional point monitoring tools aren’t enough as they have a hard time scaling to all the modern day technologies, require intrusive agents and only work for known events but don’t handle anomalies and unknowns. Too many issues end up being reported by end users themselves rather than the monitoring systems. With Splunk Henry’s team detects event anomalies in real time and has dramatically cut their response time by hours per incident.

Tools for the Help Desk

Sometimes it’s the simple things that can cut your response time, escalations and IT budget. The Amkor team noticed a lot of calls and emails regarding VPN set-up and access across the company. With Splunk level 1 help desk agents are now able to resolve most of the VPN issues without creating an escalation. Henry’s team built a VPN dashboard driven by a series of searches and reports that gives entry level help desk personnel the insight they need to troubleshoot problems right away.

Henry’s Splunk Tips

The best part of Henry’s overview were the tips for a successful Splunk implementation. I’ve included the list here in hopes that these may help you as well.

  • Provide training that caters to each group’s need.
  • Utilize the deployment Server.
  • Develop a Common Information Model.
  • Update and change as needed.
  • Use Tagging to Normalize Data.
  • Monitor Scheduled Compliance Reports by using the Audit Logs.
  • Splunk into your processes where possible.
  • Setup Test/Dev Environment and a Test/Dev Index .

Intuit Consumer Group

The Intuit team of Jeff Ludwig, Chief Architect and Larry Raab, Architect of the Consumer Group joined us to share how use Splunk in production support operations. Jeff leads the Consumer Group’s Connected Services Development for electronic and print tax and payroll filings for TurboTax, ProSeries, Lacerte and QuickBooks. Larry speciali a large-scale, highly available application and systems architect responsible for the consumer group applications and infrastructure.

While the original use for Splunk at Intuit was application management, Jeff and Larry covered three additional ways they have applied Splunk including reliable monitoring, improving user experience and large-scale reporting for compliance and business intelligence.

If Splunk Was An Animal What Would It Be?

Splunk 4 is out of the bag and the Splunk community and our customers are kicking the tires. I even saw several executives from other log management, SIEM and system management vendors registered and attended our world-wide webcast with a thousand attendees. And Twitter is all abuzz with questions, answers and some ass kicking. Yes Splunk 4 kicks ass. It is 2x faster on indexing and up to 10x faster searching. We have a fantastic new App framework where you can build custom views, dashboards and work flows and there are countless numbers of other great improvements and new features. But sometimes we don’t get it completely right and you all let us know.

But back to my question, if Splunk was an animal what kind of animal would it be?

“Odd thing animals. All dogs look up to you. All cats look down to you. Only a pig looks at you as an equal.”

- Winston Churchill

I read that quote today at the birth place of Winston Churchill and it reminded me that Splunk is like a pig. We’ve always looks our users and customers straight in the eye with the good and the not so good. This has always been the transparent way we conduct business. So keep the feedback coming - the praise and the criticism.

One of the areas that I’m especially interested in hearing about is our new App focus. We are in the very early stages of creating Splunk Apps and making them available to the Splunk community. Some are free Apps and some are premium Apps. The free apps are available for immediate download. The premium Apps you need to talk with us about so we can work with you on an installation. At some point we plan to have trial versions of the premium Apps available for download too.

The free Apps include things like

You can easily download the App .spl file, drop it into your splunk/etc/apps directory and check it out. More easily you can download and launch the Apps right from your Splunk Launcher screen (which is an App too). We’re working on fully documenting all these Apps so if you need help now feel free to contact us via support@splunk.com. You can also select “Send Feedback…” on the first menu of the App to contact the specific App team directly via email. We’re especially interested in what doesn’t work, where you get stuck and what else you’d like to see. Several of these Apps are still beta versions so feedback sooner rather than later is much appreciated.

Happy Splunk4ing!