thebaumblog: Splunk Apps

Splunk Live Princeton 2009

Wednesday and we’re at Splunk Live Princeton, NJ. What an awesome place. Princeton is home to a great university and some great culinary experiences. Check out Mediterra — an interesting mix of Italian and Spanish influences. Apparently it’s where all the Princeton parents treat their kids to dinner when they are in town. Next store to our venue was the great hope for the state of NJ — a new Governor. The current Governor has turned the state budget and tax base into toxic waste. Well things went much better for the more than 60 Splunk Live attendees in Princeton today, who gained insight into how a number of large Splunk customers keep their mission critical applications running in a time of IT budget slash and burn.

Matthew Stevens, Director Software Systems and Architecture at Comcast provides guidance to Comcast executives on mission critical media systems and strategic systems architecture. Comcast is the country’s largest provider of cable services serving 23.9 million cable customers, 15.3 million high-speed Internet customers and 7.0 million Comcast Digital Voice customers.

Comcast Developer Network

Matthew’s latest project is the Comcast Developers Network a Comcast-scale secure web services platform for the development of cool new media and entertainment offerings. The Comcast Web Platform environment generates of billions of software events each day from caching and load-balancing, origin application servers, databases, middleware and content delivery networks for images and video streams. Comcast services demand high quality. Much of the Comcast content is exclusive and premium services drive revenue. Interfaces between technology components (applications, delivery platforms) need to adhere to best practices to ensure the highest degree of end customer experience.

Why Splunk?

Comcast has acquired many system and application management platforms over the years, but nothing was providing the team with the robust information from operational telemetry the teams around the company need to ensure data integrity, stability, application quality and efficiency. Several efforts specifically drove Comcast to consider and deploy Splunk.

  • Product rollout: The team wanted the ability to predict and correct potential issues before going live into into production—Splunk has become a required best practice for new product rollouts.
  • Network/ System Integrity: Understanding security and user experience across a very large network and set of systems is a must to protect the business. Splunk provides the insight the network and system teams need across many different silos of technologies.
  • Business Intelligence: Having immediate access to real-time events and historical trends allows the various Comcast business teams to react quickly and adapt to changing customer behaviors.
  • Agility: Alerts and Dashboards indicate discrepancies so distributed teams can investigate immediately and remediate failures and attacks.

Video CDN/CMS Performance

“In content management systems and delivery networks a devil walks the long tail. If you’re facing concurrent hits across the tail of the curve, sharpen your pencil, you’ve got problems!”

Splunk helps Comcast understand the risks of instability in our systems, especially during periods of high concurrency. Through pre-production modeling of even patterns and subsequent monitoring of these patterns Splunk pays for itself by helping Comcast avoid deployment of vulnerable systems, downtime, and upset customers.

Predicting System Imbalance

Comcast has successfully used Splunk to evaluate potential infrastructure vendor’s solutions and determine if they will balance loads properly across a large, indeterminate infrastructure. Often the answer is no as illustrated here in a Splunk report of resource utilization across various services.

Splunk has also been utilized to see whether solutions will be resilient to different traffic patterns, helping the company perform predictive analysis before making critical infrastructure investments.

Load testing is performed during non peak hours and the results are analyzed for system failures over time using the telemetry data Splunk can correlated across various logs, messages and events.

When failures are found the Comcast team uses Splunk reports to dig deeper into the data.


Security and Compliance

In addition to operations use cases, Comcast security and compliance teams leverage the consolidated logs across data centers to enable faster threat assessment and security monitoring.

  • Monitoring for bad actors to trigger alerts,
  • Conducting threat detection over time,
  • Detecting attacks/vulnerabilities in systems and
  • Auditing systems in support of security assessments and compliance.

What’s Next?

Next up for Matthew and team is the launch of the Comcast CodeBig Platform enabling a network of developers to create content for the network. Some of these developers are already using Splunk in their own managed services like Mashery. Comcast is working to hook the Mashery Splunk installation to their own in-order to provide visibility across multiple services and providers of content and entertainment functionality.

Chris Abboud manages the Enterprise Systems Management team at Dow Jones — monitoring customer facing infrastructure and applications. Dow Jones provides global business news and information services to millions of consumers and enterprise media groups. Keeping these revenue generating services running 7×24x365 is the highest priority. Chris also manages the DJ service management platforms (Remedy, Knowledge Base, etc.) He’s been with the DJ organization for 10 years, in current role for 3 years.

“Our mission is to address issues before they become service impacting events. Failures are going to happen — we need to make sure people know about them as soon as possible.”

The Splunk Set-up

The Dow Jones Splunk installation includes

  • Data from 6000+ servers globally,
  • 13,500 + source types,
  • 1,700 network devices (primarily Cisco and Juniper) and
  • Ten distributed Splunk servers in difference geographies index ~100GB a day and provide a new global logging console.

Why Splunk?

Each Dow Jones command center now has the ability to know what’s happening before customers do across a wide range of internal and external services. Splunk speeds the time to resolution for email outages that may impact internal users’ productivity and editorial sites downtime that can directly impact to customer service and revenue. Dow Jones has found Splunk generates significantly fewer false positives than traditional monitoring systems and new resources are much easier to manage and deploy.

Splunk 4 Lands in the Southwest

Last week we continued our road show launching Splunk 4 through the Southwestern US in Phoenix, San Diego and Los Angeles.This was our second annual gathering of customers, partners and users and we had more than double the attendees at this year’s Splunk Live events. In the morning we held a three-hour hands on technical workshop. Attendees had the opportunity to install and configure Splunk 4 on their laptops or remote server and get one-on-one assistance from the Splunk team. Afternoon sessions and dinner focused on customer presentations. We’re very grateful to all the presenters who took time out of their busy days to share with everyone how Splunk is transforming their IT environments. I captured some notes from the week and thought I’d share them with you.

Early Warning

In Phoenix we had a packed house at the Sanctuary conference center on the side of Camel Back Mountain. At 109 degrees I decided against hiking up it in the early AM. Dave Bridgeman, Data Security Engineer at Early Warning kept things cool showing the audience how his company’s use of Splunk in their security operations center. Early Warning collaborates with major financial services companies to facilitate fraud detection through shared information and knowledge in cross-institution environments. The company has an interesting history having spun out of First Data and is now primarily owned by Bank of America, BB&T, JPMorgan Chase and Wells Fargo.

Dave is a well rounded IT professional who started as a developer then moved into network and security management. He current leads the data security team for Early Warning. The environment he over sees includes a variety of platforms including AS400s, MP300s, AIX, Solaris, Linux and Windows. He uses a combination of Splunk forwarders and syslog forwarders to collect Java and Cobol application logs and FTP/SFTP networking logs.

The Early Warning Splunk installation is designed to track transactions and users from one bank to the next in cross-institution activities. Transaction ID tracing correlates events across applications and services and Splunk alerts the team when jobs fail so the operations and development teams can securely troubleshoot issues on the fly. And remote accessibility mean no more driving into the office to access locked down servers in the middle of the night. On the security side of things Splunk helps Dave’s team track and monitor known fraudsters and bad user names allowing them to stay vigilant when monitoring external attacks. They also use Splunk to deliver reports for customers, executive committee members and the Security Advisory Committee (with representatives from the founding banks).

Amkor

Henry Grant of Amkor a $2.1B provider of packaging/assembly and testing services for the semiconductor industry also presented an overview of how his Corporate Data Center team uses Splunk. Henry overseas operations for the company’s SAP, PLM, Supply Chain, Hyperion and Oracle systems. Amkor has a heterogeneous environment of Sun Solaris, IBM iSeries, Cisco ASA firewalls, packaged and custom web and J2EE applications and TACAS/Radius accounting and access control technologies. With manufacturing locations in China, Japan, Korea, Taiwan, Singapore and The Philippines and headquarters in Chandler, AZ, the Amkor team is challenged with log and event data overload. GBs of data a day generated at multiple points makes operational troubleshooting and security investigations extremely complex.

SOX Compliance

Proving SOX compliance has traditionally been handled by writing and maintaining scripts to collect and report on errors, access controls and log access activities. It was impossible to segregate duties given the lack of access control to the logs and events themselves. Splunk has taken the place of the awkward script writing and maintenance to collect iSeries, Unix and application events and logs and provide automated schedule reports. The team is now expanding the Splunk footprint to handle network and Oracle logs as well.

Application and System Monitoring

Like most enterprise IT shops, Amkor has figured out that traditional point monitoring tools aren’t enough as they have a hard time scaling to all the modern day technologies, require intrusive agents and only work for known events but don’t handle anomalies and unknowns. Too many issues end up being reported by end users themselves rather than the monitoring systems. With Splunk Henry’s team detects event anomalies in real time and has dramatically cut their response time by hours per incident.

Tools for the Help Desk

Sometimes it’s the simple things that can cut your response time, escalations and IT budget. The Amkor team noticed a lot of calls and emails regarding VPN set-up and access across the company. With Splunk level 1 help desk agents are now able to resolve most of the VPN issues without creating an escalation. Henry’s team built a VPN dashboard driven by a series of searches and reports that gives entry level help desk personnel the insight they need to troubleshoot problems right away.

Henry’s Splunk Tips

The best part of Henry’s overview were the tips for a successful Splunk implementation. I’ve included the list here in hopes that these may help you as well.

  • Provide training that caters to each group’s need.
  • Utilize the deployment Server.
  • Develop a Common Information Model.
  • Update and change as needed.
  • Use Tagging to Normalize Data.
  • Monitor Scheduled Compliance Reports by using the Audit Logs.
  • Splunk into your processes where possible.
  • Setup Test/Dev Environment and a Test/Dev Index .

Intuit Consumer Group

The Intuit team of Jeff Ludwig, Chief Architect and Larry Raab, Architect of the Consumer Group joined us to share how use Splunk in production support operations. Jeff leads the Consumer Group’s Connected Services Development for electronic and print tax and payroll filings for TurboTax, ProSeries, Lacerte and QuickBooks. Larry speciali a large-scale, highly available application and systems architect responsible for the consumer group applications and infrastructure.

While the original use for Splunk at Intuit was application management, Jeff and Larry covered three additional ways they have applied Splunk including reliable monitoring, improving user experience and large-scale reporting for compliance and business intelligence.

Splunk Live London - Awesome

I’m finally getting my head above water after a tireless run up to and hectic week launching Splunk 4. The highlight of the launch for me was Splunk Live London. IMHO Splunk Live London 2009 was unrivaled as the most outstanding Splunk event yet.
We came up with this idea of getting local customers together as a way to launch Splunk 2 in June 2007. Five of us Splunkers sprinted between eight different cities in two weeks to share what was new and encourage users to exchange stories of how searching their data centers was changing life for the better. Its an exhausting way to launch a new product, but it worked so well we’ve integrated Splunk Live events into the mainstream way we do business and interact with our community. I’ve long since lost count of the number of Splunk Lives we’ve conducted all over the world including places like Cape Town, Johannesburg, Beijing, Tokyo, Singapore, Bangkok, Sao Paulo and yes once again in London.



This year’s London Splunk Live was really special. The event occurred during our launch of Splunk 4 and surpassed our expectations as the largest event we’ve ever held. More than 100 customers and users attended at the Cumberland Hotel and their swank conference facility, complete with a business canteen like breakfast experience, near Marble Arch in West London.

But the dominant reason to attend any Splunk Live are the presentations and round tables with forward thinking IT professionals who are using Splunk to transform the way they manage IT. This year we were very fortunate to have three Splunk customers who took time out of their busy schedules to come to London and share their experiences with us.

Accenture - Alexander Strobl, Technical Consultant

Alexander has been a visionary inside Accenture bringing the power of IT Search to enterprise clients in Germany where he works for Accenture as a Technical Consultant in the Data Center Technology and Opeations team. Alexander is responsible for analysis, design, roll out of Splunk. His most recent Splunk project was with a large worldwide services company with more than 50,000 employees on three continents operating mail order, distribution, e-commerce and over-the-counter-retail trade. Accenture implemented Splunk to transform the management of several technologies including Linux, virtualization and large-scale storage systems.

The project was part of an IT project to reduce the time to triage problems and improve quality of service. Challenges were:

  • no centralized access to logs and events,
  • critical IT data was stored on local file systems which were copied to central storage only once a day,
  • manual processes to locate errors,
  • no correlation between events on different services/servers and
  • development time was spend building workarounds rather than working on revenue generating applications.

All of this resulted in complex and time consuming analysis and end the end long MTTR.

The Accenture Splunk installation is currently indexing ~50GB/day including custom application files and events from 10+ integrated business critical applications and services. There are two Splunk indexes; one for testing and one for production environments and the team has established interfaces between Splunk and several other legacy data center tools.

Telenor - Henrik Strøm, Security Architect

Telenor is Norway’s largest ISP, Mobile Operator and Telco. Its one of the largest mobile operators in the world, with 160+ million customers and was founded in 1855 - 154 years ago. The company has 13.000 employees in Norway and 26.000 abroad. Telenor has been rolling Splunk out for centralized log collection and management using Syslog to forward data where it is already in place and using Splunk as a forwarder for new systems and systems with complex multi-line and/or XML structures Syslog can’t handle. Sources of data handles by Splunk include:

  • application logs (Web, Email, IPTV)
  • data center logs (server, network, storage and firewall)
  • IP backbone logs

Use cases include what Henrik refers to as digging, dashboards baselines, alerting and reporting. One of the best “digging” examples Henrik mentioned was identifying Unix Kernel Errors over the last 30 days. This kind of information routinely went unnoticed prior to Splunk’s arrival.

Another powerful use case explained by Henrik was how to baseline what is normal in your environment. For example, how many errors do you have on average for a particular type of device (routers, servers, specific applications, etc). Splunk was used to baseline normal Linux kernel behavior and found roughly 20 kernel errors per Linux running instance every 15 minutes.

The base line then allows the team to schedule simple searches to look for deviation from the baseline and send out alerts before downtime occurs from these hidden sways in behavior. In one case Splunk found thousands of errors occurring on a specific type of device, where the normal baseline was around 20!

The Telenor team also uses Splunk to identify and report on security situations that may impact their customer facing network and services. Because they are able to easily compose dashboards showing for example which Web servers are under attack and who is attacking them all in one place, the team saves Telenor from potential downtime, performance degradation or theft of data due to attacks they’ve not seen before and are missed by existing security policies and technologies.

Vodafone - Paulo de Carvalho, Network Services Manager

Paulo de Carvalho has been using Splunk at Vodafone for almost two years now. His presentation titled “Freeing Information from Organizational Silos” lifted the idea of leveraging logs and IT data out of the realm of just system administration into a thirst for higher level intelligence that crosses not only IT but also business functions. Paulo started by describing the current service oriented architecture (SOA) at Vodafone and how attempts to objectize and re-use capabilities creates incredible complexity among the services, technologies, processes, tools and people.

If Splunk Was An Animal What Would It Be?

Splunk 4 is out of the bag and the Splunk community and our customers are kicking the tires. I even saw several executives from other log management, SIEM and system management vendors registered and attended our world-wide webcast with a thousand attendees. And Twitter is all abuzz with questions, answers and some ass kicking. Yes Splunk 4 kicks ass. It is 2x faster on indexing and up to 10x faster searching. We have a fantastic new App framework where you can build custom views, dashboards and work flows and there are countless numbers of other great improvements and new features. But sometimes we don’t get it completely right and you all let us know.

But back to my question, if Splunk was an animal what kind of animal would it be?

“Odd thing animals. All dogs look up to you. All cats look down to you. Only a pig looks at you as an equal.”

- Winston Churchill

I read that quote today at the birth place of Winston Churchill and it reminded me that Splunk is like a pig. We’ve always looks our users and customers straight in the eye with the good and the not so good. This has always been the transparent way we conduct business. So keep the feedback coming - the praise and the criticism.

One of the areas that I’m especially interested in hearing about is our new App focus. We are in the very early stages of creating Splunk Apps and making them available to the Splunk community. Some are free Apps and some are premium Apps. The free apps are available for immediate download. The premium Apps you need to talk with us about so we can work with you on an installation. At some point we plan to have trial versions of the premium Apps available for download too.

The free Apps include things like

You can easily download the App .spl file, drop it into your splunk/etc/apps directory and check it out. More easily you can download and launch the Apps right from your Splunk Launcher screen (which is an App too). We’re working on fully documenting all these Apps so if you need help now feel free to contact us via support@splunk.com. You can also select “Send Feedback…” on the first menu of the App to contact the specific App team directly via email. We’re especially interested in what doesn’t work, where you get stuck and what else you’d like to see. Several of these Apps are still beta versions so feedback sooner rather than later is much appreciated.

Happy Splunk4ing!

How Much More Free Can Free Get?


Well if you ever wanted to integrate Splunk into your own product or service, free is now really, well … free. We’ve always had a free Splunk license for end users. But now we have the same for software, hardware and service provider partners. Now as a Splunk Powered Associate you can distribute Splunk with the free license key as part of your offering. You can also link to the Splunk free license download and earn referral credits if the download leads to a purchase. Pretty cool heh? Now the free license is still limited to the 500MB daily uncompressed indexing volume but hey that’s a lot of data for free.

A few of our Splunk Powered partners have picked up on the real potential here. F5 Networks, for example, has created a Splunk App that pre packages searches, alerts, reports and dashboards for F5’s ASM and FirePass products. Now F5 customers get real-time search, alerting, reporting and analytics for free with
Splunk for use with F5 Networks. Support for F5 LTM and BIG IP is coming soon.

And the folks over at RightScale are taking Splunk into the clouds. RightScale is a great cloud computing management platform that let’s you control your cloud resources across several different providers from one interface. We use RightScale at Splunk to control our demo instances on Amazon EC2/S3. Each demo instance consists of one or more servers running in the cloud that recreate a live IT environment like a J2EE-based E-commerce application, a converged network or a rack of Microsoft Windows Servers. It’s important that we are able to scale these instances up and down dynamically and RightScale comes to the rescue. The integration of
Splunk and RightScale gives cloud us the IT control and visibility we need.

Every piece of software, hardware and service on the planet generates IT data. And now you can bring Splunk to your community by integrating it into your solution at no cost to you, your channel or your customers. To join the Splunk Powered Associate program just Sign-up to be a Splunk Powered partner and we’ll take it from there.

Happy Splunking!

Splunk Lab in Asia Launches to Develop New IT Search Apps

The last two weeks I’ve been traveling throughout Asia with our new partners at Systex and the Splunk Asia team. In Singapore, Hong Kong, China and Taiwan we met with government agency, high tech manufacturing, insurance, online gaming and managed service provider customers who told us how critical Splunk is to their IT organizations, especially as budgets get even tighter.

Systex is now our master distributor covering Taiwan, China, Hong Kong, Singapore, Thailand and Malaysia. Systex is an amazing company fueled by Taiwanese entrepreneurship, creativity and innovation. The company is part distributor, part reseller, part system integrator and part independent software developer. The 2,900 Systex employees are led by CEO Hilo Chen and COO Frank Lin. Hilo did a stint at Yahoo! Asia before joining Systex as CEO. He is a very friendly, engaging and good nature executive who commands the passion of his team. Frank is detail oriented and intense and he has an ability to focus on what seems to be the impossible and get it done.

I’m not used to people pushing faster than I do, but the Systex team are reminding me what start-up speed is all about.

The Systex system integration and software business is fueled by more than 1,400 engineers with deep domain expertise in financial trading and banking systems, network security, database administration, storage, virtualization, disaster recovery, IT service management, telecommunications OSS/BSS, unified communications, business intelligence and more. This past week we unleashed the creativity of more than 400 of those engineers, product managers, sales personnel and business unit heads. We met at a three day kickoff event for the launch of a joint Splunk Lab designed to come up with new areas to apply IT Search and new Splunk Apps for a variety of use cases.

It is our hope that our joint work together will result in lots of new Apps available for download by Splunk users all over the world.

The event started Thursday with a press conference at the Westin in Taipei. We were joined at the press conference by more than three dozen press covering innovation in Asia. We discussed the design of the partnership, the Splunk Lab and some of the joint customers including Allianz Insurance, IAH Games, and The Malaysian Prime Minister’s Office. Allianz is using Splunk to report on F5 Big IP load balancer activities. IAH is mining their online multi-player game events and logs for insight into user patterns and activities including market basket analysis across different game properties. The Malaysian PM’s office uses Splunk to secure their email messaging system.

The press asked some very good questions about various use cases and our strategy for accelerating activities in Asia with Systex. Richard Tang and Johnny Lin attended the event from Systex as well and provided a great overview of how the Splunk Lab is coming together and what kind of solutions Systex is creating around Splunk. Richard has been very patient with me and has taught me enough Mandarin to completely embarrass myself during my last few visits.

On Friday 260 engineers and product managers attended an all day Splunk Boot Camp at the Systex UCOM training center in downtown Taipei. The day was divided into two three and a half hour sessions. Each session covered using, administering and deploying Splunk. There was a brief section on developing Splunk Apps including building of a network management application.

One of the product managers commented to me at the end of the day, “My mind is broken on Splunk, there is so much you can do with it.”

Saturday’s session was the Splunk Lab kickoff event and creative activity attended by 300 business unit heads, sales people, product managers and field sales engineers. I was amazed. We went from 8:30am to 6:30pm on a Saturday. The level of energy was unlike anything I’d ever experienced before. Taking the long trip back from Taipei by way of Tokyo, I am just in awe at how two organizations half a world a part have so tightly bonded in just six months. I’m very impressed by the Taiwanese work ethic and dedication.

Kord Campbell, Splunk’s Director of Developer/ISV program gave a great talk on developing Splunk Apps to start the working round tables. Each business unit (twelve in all) spent three hours coming up with ideas for Splunk in their unit including what Splunk Apps they were going to create and which customers they were targeting. The areas included

  • Financial Trading Platforms
  • Banking and ATM Systems
  • Database Serivces
  • Information and Security
  • Business Continuity and Disaster Recovery
  • Customer Service
  • Data Management & Integration
  • Unified Communications
  • IT Service Management
  • Education & Training

Teams were judged on several factors including creativity, feasibility, significance to current business and target customer profiles.

The winning team didn’t use slides but instead acted out their presentation in a 15 minute skit. It was wild and reminded me of how dysfunctional most IT organizations are today. Not that we needed reminding :-)

The Financial Services Business Unit was judged the winner. This team has developed market trading platform software in a joint venture with Reuters and explored using Splunk with their quotes and trading solutions and for market compliance. The first scenario involved monitoring TAIFEX, TWSE and OTC trades and examine patterns indicating potential fraudulent activities.

The second scenario showed how IT Search can be applied to troubleshooting the electronic system including buy side, sell side, cash position, web interfaces, trading systems and risk management. Actors in the scenario ranged from investors, web infrastructure managers, dealer groups, trading managers, CRM users and back office personnel. The team called their solution “A Lighthouse in the Dark.”

Perhaps the most interesting integration of Splunk though was the mining of data from the web application platform to determine which features users tapped into and which ones they tried once but never went back to. By examining page views for new functions and correlating those with trade volume deltas the team can continuously monitor the revenue effects of application and site changes.

The Splunk Lab launch has us thinking about how to get other people collaborating to build new applications for IT Search. We’re planning to launch a public site soon that will allow domain experts from all over the world to work together and create great Splunk Apps. So we decided to take the elevator to the top floor of Taipei 101, the world’s tallest building to look for more…


Top Floor at Taipei 101


View to the East of Taipei

Press Conference


Frank Lin, COO, Systex


Me


Robert Lau - Splunk & Emy - Systex


Hilo Chen, CEO, Systex


UCOM Technical Training Center

Kord Campbell - Splunk


Splunk Lab Team Competition


Winning financial services App


A little bit of fun

Taipei 101 - World’s Tallest Building

Splunking VMware virtualization at VMworld

This week things were rocking and we were splunking at VMworld. VMware launched their road map for their Virtual Data Center Operating System (VDC-OS). VDC-OS is VMware’s vision to aggregate virtualized servers, storage and network resources into a common platform that manages resources for guest operating systems and applications. And we launched Splunk for VMware. It’s an application build on top of Splunk that gathers data from from different levels of the VMware virtual stack including the hypervisor configuration, metrics and events, the host operating system, underlying network and guest OS and applications. The application also gives you predefined searches, alerts and reports to troubleshoot and secure your VMware environment. It’s free and you can download it here.

VMware VDC and Splunk for VMware

VDC-OS represents a big leap forward in managing the complexity virtualization hoists upon us. Finally vendors like VMware and Microsoft (will soon ship their own System Center Virtual Machine Manager) admit managing complex combinations of virtual resources is difficult and important. This is great for monitoring the hypervisor and virtual guest sessions, but what about the resident guest operating systems or applications? Its still impossible to correlate activity and performance at an application level with resource utilization and performance down to the bare metal

While these vendors are focused on deploying and tracking the resources themselves, Splunk focuses on providing visibility into the complex interactions and dependencies within a virtual infrastructure. Splunk finds, collects and persists the otherwise perishable log, event and configuration data from dynamic virtual instances as they come and go. Splunk correlates data across tiers in the virtual stack — both inside and outside the hypervisor and guests including the physical servers, hypervisor, VMs, and deployed applications,.

When you point your web browser to the Splunk for VMware application you’ll notice several dashboards already created.

  • VM Metrics Dashboard - a view of the last hour’s memory and CPU utilization across all running VMs so you can pinpoint hot spots.
  • VM Status Dashboard - current configuration, available storage and other key status indicators from different tiers including hypervisor; access & weblogic logs from deployed applications within the guest OS; perfmon, ps and top from the guest OS’s.
  • VM Searches Dashboard - all searches, alerts and reports included with Splunk for VMWare.

You’ll see on the searches dashboard a number of investigation searches that correlate the VMWare API data with OS data from within the guests to perform complex investigations in a single step. This dashboard also shows you the details of predefined alerts like looking for guests with heartbeats, looking for storage capacity problems, and other common issues.

As concepts like VMware’s VDC-OS become reality (some time in 2009 according to VMware) having the ability to trace transactions through a virtual infrastructure will become even more important. Every layer of management and abstraction (and yes that’s what virtualization is) means more complexity to manage. Just as with previous VMware products, VDC-OS will not manage physical hardware that has not been virtualized. And understanding how the virtual infrastructure is interacting with non-virtualized servers, storage and networks will remain a critical requirement.

Check out Splunk for VMware and let us know what you think and how we can continue to build on it together.

Splunk Developer Camp 2008

It’s Sunday night before the start of our first ever Splunk Developer Camp. Never before have we invited developers from our community at large to participate in sharing their ideas about building Splunk Apps and learning about all the cool stuff in our upcoming releases. I think I can speak for everyone at Splunk when I say we are truly amazed with the level of interest and participation. We’ve had to move the venue three times now to accommodate the growing list of participants and while we initially expected the mix would be mostly existing customers, we’re really pleased with the mix of developers coming tomorrow.

  • 125 Developers
  • 91 Organizations
  • 26 Industries
  • 9 Countries

Only a third of the developers showing up are customers. The rest are system integrators, MSPs, OEMs, ISVs and VARs.


Post Camp Update

We’ve organized the day into a combination of an un-conference format with developer round tables, sneak peaks of future versions of Splunk, demos, demos, demos from customers and partners and training on the Splunk API and SDKs. Our goal for the day was to both educate campers on how to effectively build Splunk apps and to get everyone jacked up about the possibilities. We broadcast the sessions live on Splunk TV.

The day started with a quick intro by me. I gave everyone a brief Splunk history lesson of the past five years and demos of the Splunk for PCI and Splunk for Server Virtualization applications. I wrapped with a discussion of our strategy to seed Splunk everywhere and to enable developers to distribute their applications to Splunk installations around the world in the near future. More on this in a future post.

Erik Swan and Rob Das, my two co-founders followed with a more in-depth evolution of Splunk chat which many focused on all the weird prototypes and company names we thought of before the real Splunk. Some of it is funny and some down right scary. Amazing what guys out of a job can come up with.

Konfabulator Follow Along

Next up Kord Campbell, Director of our Developer Program gave an overview of agenda for the day and reviewed how to register with the Konfabulator and follow along with the many demos up on our SplunkLabs EC2 server at Amazon Web Services. This worked great as everyone could build and run the demos on their own EC2 instance. Kord also showed off the new Splunk Wiki for developers and application users. We’re in the process of moving all our documentation to the wiki as a one stop shop for information on using, administering, deploying and developing for Splunk. A few other Kord matters included the review of our new Developer Program additions including a 2GB Developer Enterprise License for registered developers.

Splunk Apps

Jef Bekes, our Head Designer and Raffy Marty our Application Product Manager then gave a very inspiring talk about the future of Splunk and Splunk Apps. The basic point being in Splunk 3.3 today there is no sense of application context. This means the same default user-interface for all applications and that all knowledge (saved searches, alerts, reports etc.) is shared across all installed apps. It’s impossible also to “switch” from one app to another. Splunk 4.0 attempts to address this whole problem by making applications first class objects that can be containers for collections of other objects at the interface, knowledge and configuration layers. As more an more Splunk applications arrive on the scene this encapsulation becomes increasingly important. Jef and Raffy showed a sample Splunk 4.0 Help Desk application that included custom branding, restricted task-based navigation and structured search user interfaces and results views. Other Splunk 4.0 features were reviewed too; Splunk Web gadgets, the Application builder, improved charting and content grouping.

Developer Platform and API

The Splunk Developer Platform futures was up next with Tom Donahoe, Splunk Product Manager and Johnvey Hwang Lead UI Developer. Topics included the Splunk 4.0 improvements like Application Builder, REST API Additions, UI Extensibility and SDK Support. The Application Builder eases application creation and packaging dramatically improving the experience beyond where Splunk 3.3 currently stands. The Application Builder will be available in both command-line and GUI to provides application configuration isolation and leverage file system security controls. Johnvey reviewed with us planned REST API additions for 4.0 like

  • Alerting: history, status, improved generation
  • Notifications: email, RSS
  • Search scheduling management
  • Knowledge management
  • Authentication: users, roles, single sign-on
  • Distributed: topology data, server metrics

Splunk Ninja

The Splunk Ninja (aka Michael Wilde) graced us with a visit and showed off his demo Godness with a Zero-to-Lightspeed set-up and data eating with the new Splunk Crawl feature in 3.3. Sweet!

Search Language

David Carasso, a Senior Developer and Alex Raitz one of our Solution Architects did a fantastic overview of the Splunk search language and ran through some really cool examples of powerful stuff like

  • What’s the most important hard disk error on each of my hosts?
  • Who sent me the most email?
  • How long do users stay on my website?

David showed us how to create our own search commands too. Awesome stuff.

Large Scale Reporting and Summary Indexing

Steven Sorkin, Head Indexing Geek led a wonderful talk on large scale reporting using great examples like finding violations in security data on application layer firewalls and routers. He covered how we use map/reduce models to summarize batches of events - what we call summary indexing. It turns Splunk into a sort-a time slinky.

REST/ATOM API and Splunk Gadgets

New Splunk Apps Launch at Interop and MMS

logo_interoplv2008_large.png

logo_mms_large.png
This week we were rolling in Las Vegas with Interop at one end of the strip and the Microsoft Management Summit at the other end.

At Interop we launched the Splunk for Change Management app. And at MMS the Splunk for Windows Management app made it’s debut.

Both apps make use of the Splunk Platform which provides a common set of services and APIs making it easy to create and integrate applications that leverage vast amounts of IT data. These are the second and third applications in a series of new releases we’ll be doing this year.
Splunk for PCI was the first app launched last quarter.

Splunk for Change Management App

Splunk for Change Management takes advantage of the fact that we index not just logs but configurations and file system changes as well. It also leverages a little known (but I think soon to be much more popular) Splunk search command called diff. Diff lets you easily compare two search results and returns a single result that is the different between the two. You can compare values of specific fields of results as well as every line of multi line events and files. This makes it really easy to compare configurations across lots of locations. Splunk for Change Management leverages these capabilities and brings integrated change audit, change detection and change validation.

Now your can detect unauthorized changes by indexing your trouble tickets and ticketing system logs together with your service, device and application events and configurations. We use Jira internally and find indexing our Jira tickets enables us to immediately know if a change was authorized or not. No more jumping between redundant and siloed consoles searching for the answer or writing all kinds of complicated data transformation scripts to compare the output of different management systems.

And for the first time we introduce to the industry the concept of Change Validation. Today many of us have the ability to blast out patches to hundreds of servers and device automatically. But how do we know that the changes had the desired effect? By observing the state and events generated by the actual patched systems we can now compare the before and after actual behavior. Splunk brings change audit events and configuration data together with activity and error logs so you can connect change with actual system and user behavior.

The app includes:

  • Out-of-the-box dashboards with over 40 reports showing changes across all datacenter components including applications, servers and network devices.
  • Predefined alerts that detect unauthorized change on the basis of configuration variances and correlation with service desk systems.
  • Predefined searches to help identify service-impacting changes quickly.
  • Integration with service desk systems to close the loop on change management by validating the effect of change on system behavior.

Splunk for Windows Management App

This new app integrates Microsoft’s System Center Operations Manager’s command-and-control view of a Windows infrastructure with Splunk’s IT Search. The latest version of Splunk now indexes all IT data generated by Windows servers and applications — event logs, registry keys, performance metrics and application log files. Everything is searchable from a single place to resolve service-impacting incidents faster, enhance monitoring coverage, and validate service levels.

What’s really cool is Splunk searches can be launched through Tasks in the System Center Operations Manager Console on any aspect of the infrastructure being monitored, and can be expanded to include far-flung elements of the IT infrastructure for additional context – regardless of platform or technology. Its super fast to identify information across the Windows Event Log, the Windows

The Splunk Platform Has Launched

Without a doubt the past week has been the most amazing week in Splunk history. The crazy coast to coast multi-city launch left us all exhausted and electrified. A few of the things that stick in my mind…

First Splunk 3.2 including Splunk for Windows went live on our download page last Saturday and more than 40% of our downloads in the past week have been for our new Windows version. Then Nick Selby of 451 Group wrote an analyst brief on us. He said, “Splunk is awesome: it’s multiplatform, easy to install and easy to use. And with an abstraction layer of logs, configuration files and system messages, traps and alerts, it’s seriously useful.” 451 has a reputation for ripping vendors, so we’re flattered.

Dana Gardner, analyst with Interarbor wrote a very eloquent analysis of our platform launch on ZD Net. “Splunk has created the means to offer developers easy access to that data and the powerful inferences gleaned from comprehensive IT search. That means the data can go places no log file has gone before,” says Dana. Developers are certainly doing some way cool things with Splunk.

I’ve seen a couple of neat visualization applications including this one called Replay. It shows you a live or time lapsed view of your event streams. Here you can see the replay application hooked up to our internal wiki showing who’s doing what over a 24 hour period. Click on the image for the movie.

replay.png

As for our own applications, the Splunk for PCI app drew tremendous interest at our series of Splunk Live events this past week. It’s just one example of how a business person with domain knowledge can package their own Splunk configuration as an application. If you haven’t seen Raffy’s video on the PCI Application, check it out here.

pci.png

We also showed the Splunk for Change Management application as well. Seeing someone touch a file and watching the Splunk dashboard update instantaneously is an awesome display of how flexible Splunk has become. Check out the developer program for yourself and get your goods up on SplunkBase so we can all check em out.

changemgmt.png

Welcome!

I’m Michael Baum. Welcome to my blog.

I hope to find time to write about some of my favorite topics including:

  • Splunk and IT Search.
  • Technology gadgets and software — the stuff we all like to use.
  • Datacenter applications, servers, networks and security — the stuff we all have to keep running.
  • Business, entrepreneurship and venture capital.
  • Wall street and investing.

Comments are always welcome and you can also reach me via email at thebaum (at) splunk (dot) com.