Splunk and Synthetic Monitoring

Monitoring your Web Application is not always an easy task. The challenge is even bigger when you want to be proactive about monitoring your application. How can you detect application performance problems before your users actually detect it? How about monitoring the availability of your Saas application knowing these environments are typically locked down: you can’t install an agent and you rarely have access to the instance log files thus limiting your visibility into the application.

A good solution for the above challenges would be to use synthetic monitoring. In a few words, synthetic monitoring is nothing more than a simulation of user interactions to your web application, which then allows you to measure the performance and availability of your application:
http://en.wikipedia.org/wiki/Synthetic_monitoring

The question now is, how can we use Splunk to do synthetic monitoring. There are a few solutions out there that you can use. Some you have to pay for and others you may need to code yourself , which may make things a bit more technical and challenging.

A solution that first comes to mind involves Selenium. Selenium is an open source automation tool. The main usage is typically functional testing and sometimes it can be used for administrative automation purposes. Selenium has broad support of all of the large browser vendors such as Firefox, IE, Opera and Safari as well as support for mobile device automation. What is more compelling about Selenium is that it has an IDE (installed as a firefox add-on) that you can use to record your user interaction and generate the corresponding script in various languages such as Java, Ruby, C# or Python. This abstracts the complexity of writing your script from scratch turning the experience into a “record-replay” one.

Now how can we use Selenium to automate the user interaction and leverage Splunk’s powerful machine data platform and visualization layer to monitor application performance and availability?

The idea that came to mind is to create an app that would allow a Splunk user to take any script recorded in Selenium IDE, export the script in Python and use that script in Splunk. Let Selenium simulate/automate user interactions and let Splunk measure various metrics and take care of the visualization.
The Splunk App needs to be able to invoke the Selenium scripts but most importantly calculate metrics like Transaction response time, network latency, capture the browser used in the emulation, the location, etc.

Introducing the Splunk App for Synthetic monitoring. The app’s link on apps.splunk.com:
https://apps.splunk.com/app/1880/

So how does the App work?
In a nutshell:
• Automation scripts need to be recorded in Selenium IDE (firefox Add-on)
• Splunk App for Synthetic Monitoring allows you to measure end user performance and Transaction response times
• Measure performance from different locations, browsers and operation systems

 

What does the app include?

• Python Module (SplunkTransaction) to measure Response time, OS, Browser type, location, etc. as well as indexing this data
• Sample scripted inputs to replay Selenium Scripts that can be used as an example
• Dashboards to capture and compare different metrics
• Scripted input to measure network round trip time and latency

So how does it work from an App architecture perspective?

• Automation scripts need to be executed on Splunk Universal Forwarder. This helps run the scripts from various locations (where the universal forwarders are located)
• Forwarders are ideally located in the same geographies as the application end users
• Scripted Inputs are used to schedule script execution
• Splunk Deployment Manager can be used to manage
– Scripted inputs distribution across Universal Forwarder
– Push configuration and Selenium Webdriver to Universal Forwarder

 

Selenium

Now, what type of metrics can we capture with this app. The following list is just a subset:

Performance By Geography

• Errors by geography
• SLAs by geography
• Availability by geography
• Performance by geography
• Network Latency by geography

SyntheticApp1

Performance by Application

• Availability by application
• Performance by application
• SLAs by application
• Errors by application

 

SyntheticApp2

Transaction Response Time
• Transaction Response Time
• Transaction Error Count
• Metric overlay with
– Network Latency
– Network Packet Loss

 

SyntheticApp3

Why Splunk and Selenium:

Leverage the power of Selenium as an opensource automation tool
Leverage Splunk Enterprise:
– As an analytics platform to measure and visualize various performance metrics
– Correlate data with other metrics such as

  • Application log files to capture application errors and backend performance
  • Stream App metrics for real user application metrics
  • Network Metrics
  • Infrastructure metrics

While this app has plenty of room to improve, in its current form, its a relatively easy mechanism to synthetically monitor you application. Please send me feedback if you try it out.
Happy Splunking….

Great ideas and something I’ve thought of in the past. This would be a GREAT win if we could implement such a thing. The hesitancy I have is this. With such automated test tools, how far can you get with performance metrics? What I mean is that you can enter “timers” in these scripts but can you program in such performance KPIs as DNC, Connect, SSL, 1st Byte, Content download, # of Bytes, W3C…etc..

Brian Lynch
February 11, 2015

You can get transaction duration and you can calculate availability and performance. You can also do network round trip time but none of the KPIs you referred to. These are very good metrics but you still get value with in existing capabilities.

Elias Haddad
February 11, 2015