Detecting outages caused by unauthorized changes

Splunk is a great solution to search, investigate as well as monitor your IT environment, whether it is application, infrastructure or network related. One perplexing issue to detect is related to unauthorized changes. Per ITIL, an unauthorized change is a “change made to the IT infrastructure that violates defined and agreed Change policies”.

Let’s take a simple example where you have a multi-tier application and one of the admins made a change on one of the configuration files without running through the CAB or the Change and Release manager for impact analysis. This config change resulted in an application outage. Using Splunk, you can easily detect the outage,  no doubt about that.

The challenge is how can you isolate the problem in that case? Most importantly, how can you detect that it was an unauthorized change that was the root cause of the problem?

To solve this puzzle you need to monitor the application log files and config files …we all know that. In addition to that you need to collect change tickets along with the associated hostname to better correlate the data together. Getting the change ticket hostnames means additional data from the configuration database CMDB. In other words, plenty of data to correlate: some of this data lives on the operation side and the rest lives on the IT Service Management side making it a challenging task. Splunk is a leader in the IT operations space; ServiceNow is a leader in the ITSM space. Bridging the gap between the two solves a lot of IT issues.

Luckily, if you install the Splunk App for ServiceNow this task is made much easier. Once you install the app, you will have the ability to index data from your ServiceNow instance including the Change Tickets and CMDB.

Additionally the app comes with a correlation rule that ties data from both CMDB and Change tickets allowing you to look for Change tickets by hostname amongst many other capabilities. In the example below, we are searching for all change tickets that are related to hostname=SAP*
Screen Shot 2014-10-31 at 2.10.03 PM

Expanding that search to marry the above data with the Application log and config files takes things to a whole new level.

Being able to bridge the gap between ITSM and the Operations team again highlights Splunk’s power as a machine data platform.