Splunking Sensor Data with Arduino and HTTP Event Collector

It’s been (relatively) chilly in the SF office the last few weeks, but given “how I feel” is rather subjective I figured it would be an excellent chance to both gather some empirical evidence, and try out the new Splunk HTTP event collector! In this post I will walk you through setting up an Arduino with an ethernet shield and temperature sensor to log data directly to Splunk.

Ingredients

DHT11

http://www.adafruit.com/products/386

Arduino Duemilanove

https://www.arduino.cc/en/Main/ArduinoBoardDuemilanove

Ethernet Shield (the older model but new should work too)

https://www.arduino.cc/en/Main/ArduinoEthernetShield

Arduino Sketch (code)

https://github.com/Dishwishy/splunkduino

Splunk 6.3 (even the free one!)

https://www.splunk.com/en_us/download.html

Wiring It Up (Fritzing Diagram)

splunkduino_bb

Setting Up Splunk HTTP Event Collector

First things first, lets set up the HTTP Event Collector to be able to receive data. I advise RTFM here:

http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/UsetheHTTPEventCollector

However, the tl;dr is Settings>Data Inputs>HTTP Event Collector>New Token

The settings I used when creating my token are as follows – you can set this up however you want, but be aware you’ll need to alter some of the HTTP payload in the Arduino sketch:

Token Settings

Name: Arduino

Source type: (new) arduino

Index: (new) arduino

HEC_inputs_edit

Global Settings

All Tokens : Enabled

Enable SSL : Disabled (unchecked)

HTTP Port Number : 8088

HEC_global_edits

NOTE: While SSL is a great security option and I recommend it on by default, in this particular case, I didn’t want to deal with it at the Arduino (client) side – so I left it off for now.

You will then be presented with a token that we will use in the sketch.

HEC_inputs_tokens

As an aside, you don’t need to set your source type or special index, I chose to do that so the example code is more thorough; more on that later.

The Code

Now that Splunk is set up, lets move on to the code (sketch). Per the below screenshots, you’ll need to update the “IPAddress server(x,x,x,x)” value on line 45 with the IP address of your Splunk instance. Note there are commas (NOT periods) separating the octets!

code_srv_ip

We will also need to update the token value on line 120. This value is the token we generated earlier when setting up our input, prefixed with “Authorization: Splunk ”. The other box highlighted in red (“Content-Length”) is mandatory. As far as I could test no other HTTP headers (host, user-agent, etc.) were needed, but a valid “Content-Length” and “Authorization” header and value are needed.

source_code

Getting back to my earlier comment about the “payload” we are sending to Splunk and the required and optional fields. In the sketch, on line 110, you see a list of key-value pairs in JSON format.

source_code_payload

You do not have to specify the source type or index, I chose to define those ahead of time when creating my HTTP Event Collector input and token for easier searching and segregation of the data. That said, I think adding more effort into categorization ahead of time is easier since updating firmware is not super fun when a sensor is in a hard to reach place. That said, if simplicity is your thing, you can simply create an input with the defaults and trim out that portion of the payload string. The rest of the code is commented, but if there are any questions, just hit me up in the comments section.

Time To Splunk It

Now I can run a search against my Arduino index and watch the data flow in real-time (at about 15 seconds apart).

arduino_search_result_live

I can even chart it over time very easily, thanks to the JSON format and Splunk’s stellar support for JSON KV pairs!

arduino_search_result_chart

Debugging – Lessons Learned & Improvements

1) HTTP headers matter – specifically the “Content Length” header value is needed or Splunk does not know how large the message is and will truncate the data. Note the WireShark trace below. There are also a newline needed between the headers and payload.

wireshark_no_content_len_header

2) WireShark is your friend – debugging was easiest just using WireShark on my Splunk system to intercept the requests. There is an introspection log in Splunk, and you could possibly use Splunk Stream as well to accomplish similar debugging – I am just more familiar with WireShark.

wireshark_success_header

3) We could use DNS to locate our Splunk server – it’s already in the Arduino Ethernet library, but my test environment did not reliably serve up my hostname…so I just hardcoded the IP address. If you’re feeling up to changing the sketch, feel free – a good example sketch to reference ships with the Arduino IDE: https://github.com/arduino/Arduino/blob/master/libraries/Ethernet/examples/WebClientRepeating/WebClientRepeating.ino